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Preface 


Statistical inference is concerned with unknowns in scientific investigations 
— physical constants, properties, relationships — unknowns whose effects 
can be discovered in whole or in part by the manipulation of some variables 
and the observation and measurement of others. Statistical inference is the 
theory that describes and prescribes the argument from observation and 
measurement to conclusion about the unknowns. Envisaged broadly, it 
encompasses the scientific method. Interpreted in familiar form, it tends 
to be restricted to observation and design with the basic variables already 
chosen and to instances of application that have significant variation beyond 
that generated by the manipulated variables. 

This book presents a unified theory of statistical inference. It is organized 
and developed as an introductory text on mathematical statistics. It pre- 
supposes a familiarity with elementary probability theory (a first course), 
with elementary vector analysis, and with multiple differentiation and 
integration (a second course in calculus). 

Statistical inference requires a statistical model — a model that describes 
the essential aspects of the process or experiment being investigated. Most 
processes and experiments contain sources of variation — identifiable sources 
that can be described by means of error variables; for example, the error 
in the operation of a measuring instrument, the variation in the raw material 
to a process, the variation in the interactions within a process, and the 
variation due to the randomization component of an experimental design. 
In such cases the statistical model must include the appropriate error variable. 

In many processes and experiments the observed response value is gener- 
ated by a simple kind of transformation of a realized error value. The first 
three chapters examine models that have an error variable and the simple 
kind of transformation: in Chapter One models for the direct measurement 
of a physical quantity; in Chapter Two the general model; and in Chapter 
Three a range of models for the indirect measurement of physical quantities, 
The analysis and the inference apply equally to any error form and are not 
restricted to the traditional normal or Gaussian error distribution. 

As a by-product, the first three chapters produce much of the standard 
distribution theory for the classical statistical model; the classical model 
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neglects the error variable and describes only the response variable of the 
process or experiment. The derivations do not employ the usual moment- 
generating functions and convolution formulas but rather an elementary 
device based on transformations. The method of derivation is simpler and 
applies to any error form, not just the normal. 

The middle three chapters examine models that have the error variables 
but also quantities that are not in direct correspondence with the simple 
kind of transformation. New methods of analysis are obtained, such as the 
method of marginal likelihood. And exact solutions are obtained for problems 
inaccessible by the traditional methods; for example, data transformations 
for the regression model in Chapter Four. 

As a by-product the middle three chapters produce the standard distri- 
bution-theory results of multivariate analysis. Again the results are obtained 
by the elementary device based on transformations and are available for 
any error form, not just the normal. 

The last three chapters examine inference for the classical statistical 
model— for applications in which the error variable cannot be identified. 
For a large number of observations the error-variable structure arises in the 
accessible model and the methods in the preceding chapters are available. 

The book contains material additional to a two-semester course in mathe- 
matical statistics. Appropriate sections for deletion on a first reading and 
more difficult problems are marked by an asterisk. Answers to selected prob- 
lems are recorded in an appendix. A solution booklet is available to attested 
instructors from Y. S. Lee and the author, Department of Mathematics, 
University of Toronto. 

The material in this book was developed over a ten-year period at the 
University of Toronto. The development was furthered by the opportunity 
to visit other universities and present and discuss various portions of the 
material: Stanford University, 1961-1962, the University of California, 
Berkeley, 1963, the University of Copenhagen, 1964, and the University of 
Wisconsin, 1965. The preparation of the final versions of the manuscript 
was made possible by support from the National Research Council of 
Canada. 

I value very much the help and advice of friends: Geoffrey S. Watson who 
read the preliminary and final manuscripts and gave crucial advice, M. 
Safiul Haq and W. Keith Hastings who examined and discussed the prelim- 
inary manuscript; M. Masoom Ali, James Bondar, and Leonard Steinberg, 
who read the final manuscript in fine detail and broad pattern; Andrew 
Kalotay, Hans Levenbach, Leonard Steinberg, and Jim Whitney, who 
worked closely on the development of sections in Chapter Two and Four, 
Y. S. Lee who carefully checked and solved the problems; Bob Montgomery 
who prepared the drawings in perspective; and Iris Martin and Mary 
O’Rourke who carefully typed the manuscript. 


Donald Fraser 
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CHAPTER ONE 


Measurement Models 


The eighteenth and nineteenth centuries saw the gradual emergence of a new 
discipline, a discipline eventually to receive the name statistical inference. 
This new discipline arose as part of probability theory, but its problems 
were different — both in kind and in purpose. A typical problem of probability 
theory concerned gambling games involving randomness, and the purpose 
was to derive models and calculate probabilities. A typical problem of 
statistical inference concerned measurement error, and the purpose was to 
infer the values of the quantities being measured. The purpose, inference , 
became the distinguishing feature of statistical inference, while the subject 
measurement error, receded from general attention. The neglect of measure- 
ment error cannot be attributed to success of the theory in treating the topic. 
Rather it indicates failure, the accumulated theory coming to partial agree- 
ment only for very special cases such as with normally distributed error. 

This chapter considers two kinds of problem involving measurement error. 
It finds an essential ingredient for a measurement model, an ingredient 
effectively absent in the accumulated theory; and with this ingredient in- 
cluded it derives the general solution for inference, a solution applicable to 
any error form, normal or nonnormal. More general problems involving 
indirect measurement are examined in Chapter Three. 

THE SIMPLE MEASUREMENT MODEL 
1 THE MODEL 

Consider an instrument / for measuring a certain kind of physical quantity. 
Suppose the operation of 1 has been investigated; and suppose its error 
pattern in repetitions has been found describable as independent realizations 
of an error variable e with probability element f(e) de on the real line i? 1 
(see Figure 1). A value of the error variable gives the difference between a 
reading of the instrument and a value of the quantity. 
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0 

Figure 1 The error distribution of an instrument. 


Now consider the use of I for a single measurement on a quantity. Let x be 
the value of the measurement and 6 be the value of the quantity. The operation 
of the instrument and the instance of measurement can then be described by 
the model 

f(e) de, 
x = 6 + e. 

The model has two parts: an error distribution /(e) de which describes the 
operation of the instrument (with e as a variable ) and a structural equation 
x = 6 + e in which a realized value e from the error distribution has deter- 
mined the relation between the value x of the measurement and the value 6 
of the quantity (with e as a constant). This is illustrated schematically in 
Figure 2. 

Now consider the use of the instrument for multiple measurements on a 
physical quantity. The multiple operation of the instrument in a sequence of 
n operations has probability element II fie-) If de* on Euclidean space R n . 
Let (x lt = x' be the sequence of measurement values and 6 be the 

value of the quantity. The operation of the instrument and the n instances of 
measurement can then be described by the 

Simple Measurement Model 

n /(e<) II de h 

i i 

x L — 6 + e ± . 



x n = 6 + e n . 
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Figure 2 The simple measurement model, n — 1. 

The model has two parts: an error distribution II f(e { ) II de t which describes 
the multiple operation of the measuring instrument (with e’s as variables), 
and a structural equation x — 61 + e (in vector notation) in which a realized 
vector e from the error distribution has determined the relation between the 
known measurement vector x and the unknown value 6 of the quantity (with 
e as a constant). 

2 THE TRANSFORMATIONS 

In the structural equation x — 6 + e a realized error value e is translated 
by an amount 6 . In the modified equation 6 — x — e, a reverse error value 
— e is translated by an amount x. Translations such as these are integral to 
the use of the instrument. 

Consider notation for translations, notation general enough to-cover the 
rescalings of interest in later sections. Let [a, c] be the affine transformation 
(c 0) on R 1 , 

[a, c\x = a + cx, 

or the corresponding affine transformation on R n , 

[a, c]x = a\ + cx. 

The composition or product of two affine transformations is affine: 

[A, C][a, c]x = A + Ca + Ccx, 

[A, C][a, c] = [A + Ca, Cc], 

Note that the product depends on the order of the component transforma- 
tions. The identity transformation is affine: 

[0,1] [a, c] = [a, c] — [a, c][0, 1]. 

And the inverse of an affine transformation is affine : 


[a, c]- 1 - l-<r'a, c" 1 ], 

[-c-V c~i][a, c] = [0, 1] = [a, c][-c~'a, cr*]. 
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A set of transformations that is closed under the formation of products 
and inverses has the algebraic structure of a group : 

Definition 1. A set G is a group if (i) for each pair (g l5 g 2 ) of elements of 
G, there is an element g x g 2 of G called the product of g x and g 2 ; (ii) for each 
triple (g x , g 2 , g 3 ) of elements of G, (g 1 g 2 )g 3 = gi(g 2 ga) (associativity); (iii) 
there is an element i of G called the identity with the property ig = gi = g for 
each element g of G; (iv) for each element g of G there is an element g _1 
of G with the property gg _1 = g~ x g = i. 

The affine transformations 

{[a, c]: — oo < a < oo, c ^ 0} 

form a group, the affine group on JR 1 . 

The translations on R 1 or the corresponding translations on R n have the 
form 

[a, 1 ]x = a + x, [a, l]x = a\ + x. 

These transformations form the location group on R 1 : 

G = {{a, 1]: —co < a < co}; 
the group properties are 

[A, 1 ][a, 1 ]=[A + a, 1], 

[a, l]- 1 = [-a, 1], i = [0, 1]. 

The simple measurement model can now be re-expressed by using the trans- 
formation notation: 

IT f( e i) TI de i> 

i i 

x = [0, l]e. 

3 THE ORBITS 

Consider how the location group G affects Euclidean space R n . The trans- 
lations [a, 1] carry a point x into the points al + x (see Figure 3). These 
points form the orbit of x under the location group : 

Gx = {[a, 1]: —co < a < co}x = {al + x: — co < a < co}. 

The orbit that passes through the origin 0 has the special form 

GO — {al: — oo < a < co} ; 

it is a one- dimensional linear subspace, the extended 1-vector. The general 
orbit Gx can be formed by an x-translation of GO : 

Gx = GO + x = (al + x} ; 
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Figure 4 A translation [a, 1] of three points. 


As examples consider x = 2 xjn, max x it x x , (max x i + min ®j)/2, 2x 3 x x , 
6 max x t — 5 min x i (see Figure 4). A location variable is linear in a restricted 
sense: linear along any orbit. 

A location variable r(x) leads to a reference point d(x) on each orbit, the 
point at which the variable takes the value 0 : 



d(x) = [r(x), 1] X x 
= — r(x)l + x 

= ( x i~ r(x), — r(x))' ; 

r( d) = r(— r(x)l + x) 

= — r(x) + r(x) = 0. 

The reference points d(x) index the orbits Gx (see Figures 3 and 4), each 
orbit has exactly one reference point. 

Each point al + d on the orbit through a reference point d has a different 
position : 

r(al + d) = a + r(d) == a. 

Note that r(x) measures position using the reference point as origin and 
the 1-vector as unit. 

The general point x can be reconstructed from its orbit and its position, 
x = [r(x), l]d(x); 

for example, with r(x) = x x : 

d(x) = (df x), ... , 4(x))' 

— (0, x 2 x x , . . . , x n x , 





x = [ x i, 1](0, x 2 — x x , . . . , x n ajj) . 
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Two location variables differ in value by a constant along any orbit: 

rfal .+ x) - r x (al + x) = r 2 (x) - r^x) = r 2 (d 1 (x)). 

Now consider the simple measurement model: 

UfiedUde,, 

x = [0, l]e. 

And let r(x ) be a location variable. The points x and e are on the same orbit: 

Gx = G[6, l]e = Ge or d(x) = d(e). 

The positions of the points x and e differ by a translation [0, 1]: 

r(x) = [0, l]r(e). 

The simple measurement model can then be rewritten with a composite 
structural equation : 

II/O*) IT de u 

r(x) = [0, l]r(e), Gx = Ge. 

4 HOMOGENEITY 

Consider a transformation [a, 1] on the axis of measurement, 
x = [a, l]x, 0 = [a, 1]0; 

and view the transformation as providing new coordinates for given points. 
See Figure 5. The transformation does not affect the physical problem of 
measurement; it affects only the numerical representation of the values 
involved. 


*2 13 


12 xi *3 


Figure 5 A change of coordinates [a, 1], 
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Consider how the change of coordinates affects the simple measurement 
model : 

II /0«) II de i> 

x = [6, l]e. 

The structural equation can be expressed in terms of the new coordinates: 
multiply by [a, 1] and simplify using [a, 1][0, 1] = [0, 1]- The equation 
becomes 

x = [0, l]e. 

Thus the model as expressed in terms of the new coordinates is 

II /(c) II de i> 

X = [3, l]e. : 

The form of the model is the same as before the change of coordinates. The 
physical problem is untouched by a transformation [a, 1]. Reflecting this, 
the model has the same form after the transformation as before. The simple 
measurement model is homogeneous under the location group. 

The homogeneity can be pictured in terms of the axis of measurement: the 
measurement model as viewed from one point on the axis has the same form 
as when viewed from any other point on the axis. 

5 PROBABILITIES FOR AN UNKNOWN CONSTANT 

In applications of probability theory it is common to make probability 
statements concerning unknown constants. Consider briefly the conditions 
for such statements. 

As an illustration suppose a deck of 52 playing cards is thoroughly 
shuffled and two cards are dealt face down on a table. The designations on the 
faces of the two cards are fixed; the designations are . unknown constants. 
An observer can make probability statements concerning the unknown 
constants; for example, 

Pr {2 spades} = ft • H- 

Such statements are based on the random process that generated the unknown 
constants. 

Now suppose two more cards are dealt from the deck face down on the 
table, and suppose the observer examines these cards and finds the first to be 
a spade and the second a nonspade. The observer can then make revised 
probability statements concerning the unknown constants; for example, 


Pr (2 spades} 


13 12. ii 39. 

5 2 51 5 0 49 13 12 .11. 

— — 50 * 49 13* 


Such statements are based on the random process as conditioned by the ob- 
served event. 
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Alternatively, suppose the second pair of cards is kept face down and 
passed to a participant in an adjacent room, and suppose the participant 
reports the item of information, “There’s a spade here.” The observer might 
then make the statement 


Pr (2 spades} = 


13 12 11 

51 * 50 * 13, 


if he thought the participant had examined only the first card. Or he might 
make the statement 


Pr (2 spades} = 


1 3 1 2 /'-) 

52* 51^ 


1 1 3 9 \ 

50 * 49/ 


1 3 12 1 1 

50 * 49 * 13, 


if he thought the participant had examined both cards and would have re- 
ported two spades if there were two spades. The two statements are contra- 
dictory. 

For this alternative situation an exact probability statement for the value 
of Pr {2 spades} cannot be made. The item of information, “There’s a spade 
here,” could have been presented for each possible second pair having one or 
more spades. For exact probability statements it is necessary to know exactly 
those second pairs for which the item of information would have been 
presented. Information needs to be in the form of an event, the set of possible 
outcomes for which the information would have been presented. The item of 
information, “There’s a spade here,” has the form of a deduction from an 
event unknown to the observer. 


The example illustrates sufficient conditions for making probability state- 
ments concerning unknown constants : (i) The constants were generated as 
realized values from a random process with known probability characteristics. 
(ii) The only other information concerning the unknown constants has the form 
of an event for the random process that generated the constants. 


6 REDUCTION 


Consider the measurement of a physical quantity. Let aq, . . . , x n be the 
measurements and 6 the value of the quantity.; and suppose the simple 
measurement model is applicable : 

II A e i) II de n 

r(x) = [ 6 , l]r(e), Gx — Ge. 

The error distribution II /(e*) II on R n describes the operation of the 
measuring instrument; it describes the random process that generated the 
realized errors e 1 ,...,e n in the structural equation. The structural equation 
in composite form gives the relation between the known values x x , ... ,x n 
and the unknown values 6 , e x , . . . , e n . 
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x„ 



*2 


Figure 6 The known orbit of e and an adjacent bundle of orbits. 

Suppose there is no other information concerning the unknowns; this can 
occur minimally if the measurement process is being examined in isolation to 
determine what information it alone supplies about the value 6. Now consider 
the information in the structural equation concerning the unknown error 
vector e. 

The orbit of e is known: Ge = Gx. Or if x is known only to a certain 
accuracy, then the orbit of e is one of a bundle of orbits, as indicated in 
Figure 6. The information about the orbit of e is in the form of an event 
for the process that generated e, an event based on the partition of R into 
orbits. : 

Consider the position of e on its orbit. The position part of the structural 
equation can be solved: 

r(e) = [ 0 , 1]-V(x). 

The position of e is described as an unknown translation, g = [0, 1] 1 , of 
the known position r(x): 


r( e ) = gr(x ) ; 
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the error position is not known. If the known position r(x) were different, 
[a, l]r(x) for example, then the structural equation would describe r(e) as 
r(e) = g[a, l]r(x) = hr(x), 

w kere h = g[a, 1] is also an unknown translation (homogeneity of the model). 
Different values of the position r(x) would provide the same description for 
r(e). There is thus no information from the structural equation concerning 
the location of e on its orbit. 


In Summary. The error distribution describes the random process that 
generated the unknown e in the structural equation. The only other informa- 
tion concerning the unknown e has the form Ge = Gx, an event for the 
random process that generated e. The conditions are fulfilled — exact prob- 
ability statements can be made concerning the unknown error e; they are based 
on the conditional distribution of the error variable e given the orbit Ge = Gx. 


7 THE REDUCED MODEL 

Consider the derivation of the conditional distribution of the error variable 
e given the orbit Ge. On any orbit, two location variables differ in value by a 
constant. It suffices to work with a simple choice; take r(e) = e v The corre- 
sponding d-vector has coordinates 

dff) = e 1 -e 1 = 0, 


4( e ) = e n ~ Ci- 

The required conditional distribution is then the distribution of e x given 

d%, • • - j d n . 

The probability element for e is 

flfto fide,. 

i i — 

The Jacobian determinant of (e lf d 2 , . . . , d n ) with respect to (e ls . . . , e n ) is 

1 0 

-1 1 
-1 0 1 

J = = 1 ; 

\ 

— 1 0 - - - 0 1 
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hence the probability element for (e x , d 2 , ... , d n ) is 

JJ fie x -f - dj) de x dd 2 • ■ ■ dd n . 

i 

The marginal probability element for (d 2 , ... , d n ) is 

h(d 2 , . . . , d n ) dd 2 • • • dd n = f f[ /(£•+ d t ) dt ■ dd 2 dd n ; 

J — 00 1 

hence the conditional probability element for e x given d 2 , . . . , d n is 

n/Oi + d { ) n/(ei + di) 

g(e x : d) de x = ~ — de i = 1 dei 


h{d 2 , , d n ) 


poo n 

n/(* 

J - 00 1 


+ dj) dt 


Kd) n/(«i + d i) 


The denominator in the middle two expressions serves only as a normalizing 
constant. 

Let r* = r*(e) be an alternative location variable: 

e { = e x + dj = r* + df . 

The conditional probability element can be reexpressed in terms of the new 
variable : 

/c*(d*)]I/(r* + d*) dr*. 

i 


The distribution has the same form as before ; and the normalizing constant 
has the same value — but is expressed in terms of the new reference point. 

The conditional distribution described in the preceding section has now 
been derived. The simple measurement model by its own information content 
produces the 


Reduced Simple Measurement Model 

g(r : d(x)) dr , 
r(x) = 6 + r. 

The reduced model has two parts: an error probability distribution g(r . d(x)) dr 
on R 1 (with r as a variable ) which provides probability statements for the 
unknown error position r in the structural equation; and a structural equation 
which gives the relation between the known r(x) and the unknowns 6 and 
r (with r as a constant). 
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8 EXAMPLES 

As a first example consider the simple measurement model with error 
variable normally distributed with mean 0 and variance a^: 

n m Ude, = (2«4)-' ! exp { - L 2 «»} n *<. 

x = 01 + e. 

The location variable x is particularly suited to the case of normal error. The 
conditional error distribution of e given d(e) = {e x — e, . . . , e n — e)' — d 


g(e) de = k'(2rrol) 71/2 exp {-—]£(« + dj ) 2 de 

2<7n 


rexp ri^r 


exp - vs n di - 


The first step in the simplification uses 

2 (T + dj) 2 = ne 2 + 2e 2 dj + 2 dl 
= ne 2 + 2 d 2 

and incorporates the contribution from S d f into the constant k" ; the second 
step supplies the necessary normalizing constant for the normal distribution. 
The reduced model is thus 


g(e) de = \2v — °j exp J 

x — 6 + e\ 

this can be expressed equivalently as 


e 2 \ de. 


x — 6 4- e, 

where z designates a standard normal variable. The error distribution for the 
location variable e has the interesting property that it does not depend on 
the values of the deviations dp, the distribution is the same on each orbit. 
Suppose that n 0 = 0.6 ; and suppose the measurements are 

62.0,. 60.5, 60.7, 61.6. 
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The simple measurement model is 
e x = 0.6z x , £2 = 0-6 z2> 


e 4 = 0.6z 4 , 


and the reduced model is 


62.0 = 6 + e x , 

60.5 = d + e % , 
60.7 = 6 e 3 , 

61.6 = d + e x \ 

e = 0.3z, 
61.2 = 6 + e. 


The probability distribution describing the unknown e is normal with mean 0 
and standard deviation 0.3. Some probability statements concerning e are 

Pr (—0.3 < e < 0.3} = 68|%, 

Pr (-0.6 < e < 0.6} = 95J%. 

As a second example consider the simple measurement model with an 
error variable that has a Cauchy distribution in standard form; and suppose 
there are two measurements, 165.1, 161.1, on the quantity 6. 

165.1 = 6 + e lt 

161.1 =0+^2- 

The location variable is as convenient as any; the corresponding d-vector 
is d = (0, —4.0)'. The reduced model is 

g(Cl) de ' = k rplj 1 + (e, -If de " 

165.1 = d + e v 

The conditional error distribution is plotted in Figure 7. The constant can 
be obtained by numerical integration from the graph itself. Probabi y 
statements concerning the unknown e x can also be derive rom e grap 
for example, 

Pr {— 1 < fil <5} 4= 89.1% 

The simple measurement model has been developed with the measurement 
process as illustration. The range of applications, however, is much broader. 
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Figure 7 The error probability distribution for the Cauchy example. 


A typical application has a sequence of observations or measurements on a 
response variable. The response variable is based on a process or system 
operating under stable conditions. Separate operations of the system are 
statistically independent as a consequence of separation in time, or separation 
in space, or separation in entity. The internal pattern of variation or error 
as it affects the response variable is known from earlier experience, and the 
spread of this error pattern is also known (or a more general model in Section 
11 is needed). These characteristics of the typical application require a scale 
of measurement, and they also require a unit of measurement; they do not 
require an origin or zero point on the measurement scale. 

Variation in a response variable can generally be attributed to-a variety 
of sources: variation in the material being processed, variation in the internal 
operation of the process, variation due to the randomization ingredient of 
experimental design. The combined sources of variation form the internal 
error of the system; the composite effect of this error produces the variation 
or error that affects the response. The typical application requires that 
external conditions of the process be controlled and that sequencing of 
observations be randomized against possible external sources of variation. 
This procedure can provide the basis on which the internal error of the system 
has stability and the composite effect of this error has known form. The 
internal error of the system is the random process referred to. in the develop- 
ment of the simple measurement model; and the composite effect of this error 
is the error variable described by the model. 

In the typical application the medial or general level of the response is the 
quantity being investigated or measured. This quantity can have numerical 
definition by comparison with standard levels for similar variables. A zero 
point on the measurement scale may be chosen for convenience. In a typical 
process the general response level depends on the levels of input variables to 
the process. For the applications considered here, these input levels are kept 
constant and are part of the conditions of the system (more general models 
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that describe patterns of dependence on input variables are available i jj| 
Chapter Three). The general response level is a consequence of the c ° jf 

conditions of the system; it has an identity distinct from the interna error |j| 

the system. The general response level is the quantity 6 in the simple measur - § 

ment model. The quantity 0 gives response expression to the int ^ rna ^^ fj 
values e n . The response expression is in the form of respon e | 

observations or determinations ... , x n , the measurements in e p |g 
measurement model. -f 

9 TESTS OF SIGNIFICANCE ^ 

In an application of the simple measurement model an outside source may M 

indicate a value 0„ for the quantity being measured. The outside source coqtd j 

be a preceding investigation on a similar quantity ; the purpose of the applied- | 
tion would then be to see whether the value of the quantity is the value m- | 

dicated by that earlier investigation. . . m 

Alternatively, the outside source could be a theory linking a variety of tj 
physical quantities. The theory, perhaps in conjunction wit va ues or som 
of the physical quantities, may prescribe a value 9. for the quantity being . 

measured. The purpose of the application would then e to see w | 

quantity being measured has the value 0 O , to chec ( t ere y w 
theory is adequate for the particular kind of prediction, and to check thereby „ 

on the validity of the theory. |® 

As an illustration consider the first example in the preceding section. T |j 

reduced model is || 

e = 0.3z, 

61.2 = 6 + e, § 

where a designates a standard normal variable. Suppose that some outside 
source has prescribed the value 9, = 62.4 for the quantity 0. The hypothesis 4 
6 = 62.4 leads to the value |j 

g = 6L2 - 62.4 = -1.2 = -4(0.3) J| 

for the error position e. This value for e is -4 standard deviations from the j 
center of the normal error distribution. The probability of a value so far or 
farther from the center of the distribution is extreme y sma . g 

Pr {\e - 0| > 1.2} = Pr {\z\ > 4} = 0.000,064. 

The value -1.2 for error position is thus almost inconsistent *»«*«"* f 
probability distribution ; in the framework of the model it suggests strong y 
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that the hypothesis is not true, and in turn that a theory that produced the 
hypothesis is not true. An additional sequence of measurements on 6 might 
strengthen or reverse the assessment. 

Now consider the reduced simple measurement model 

g{r) dr, 
r(x) = 6 + r, 

and let 6 0 be a value prescribed for 6 by some outside source. The hypothesis 
Q = 6 0 leads to the value 

r = -0 O + r(x) 

for the error position r. This value for r can be compared with the probability 
distribution g(r) dr for error position. A value in a broad central range of the 
distribution is a value in accord with the error distribution: the measurements 
are in accord or agreement with the hypothesis. A value in the extremes of 
the distribution is an unlikely value for the error distribution. Its significance 
can be assessed in part, as in the example, by calculating the level of signifi- 
cance: the probability of as great or greater departure from the center of the 
distribution. Within the framework of the model, an extreme value provides 
evidence against the hypothesis, and a value effectively beyond the range of 
the distribution effectively denies the hypothesis. 

10 GENERAL INFERENCE 

A primary need for statistical inference is the ability to extract information 
concerning an unknown quantity. A model, as it describes a system being 
investigated, contains the information about that system. Sometimes outside 
sources may also provide information. These two kinds of information 
should in general be kept separate, any combining being left to judgment and 
expediency on occasions when the information is used. Consider the simple 
measurement model and the information it contains concerning the unknown 

quantity. 

Consider first the example at the beginning of Section 8. The reduced 
model is 

g(e) de = —— 4 exp f e 2 \ de, 

■sJItt 0.3 l 2(0. 3) 2 ) 


The reduced model contains all the information concerning unknown values. 
The error probability distribution provides probability statements concerning 
the unknown e, and the structural equation links the unknowns e and 6. 
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Each possible value for e corresponds to a possible value for 6 : 

d = 61.2 — e. 

A probability statement concerning e is ipso facto a probability statement 
concerning 6. The probability statements concerning e are summarized in 
the distribution g(e ) de; and correspondingly the probability statements 
concerning 9 = 61.2 — e are summarized in the distribution 

g(61.2 - Q) dd = exp ( — (0 - 61.2) 2 ] dd , 

y/ltr 0.3 l 2(0. 3) 2 ) 

the structural distribution for the unknown value of 6. The structural dis- 
tribution can be represented alternatively as 

0 = 61.2 - 0.3z, 

where z is a standard normal variable. Some probability statements are 
Pr {60.9 < 6 < 61.5} = 68J% 

Pr {60.6 < 0 < 61.8} = 954%. 

Now, more generally, consider the simple measurement model with 
normal error. The reduced model is 



x = 6 + e. 


The structural distribution describing the unknown 0 is given by 



Now, in general, consider the simple measurement model. The reduced 
model is 

g(r)dr, 
r(x) = 6 + r. 

The error probability distribution provides probability statements concerning 
the unknown, r, and the structural equation links the two unknowns 6 and r. 
Each possible value for r corresponds to a possible value for 6 : 

6 = r(x) — r. 

The error distribution describing the unknown r is thus equivalent to the 
structural distribution 

g(r(x) - 6) dd 

describing the unknown value 6 of the quantity. 
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THE MEASUREMENT MODEL 

11 THE MODEL 

Consider a system that can be operated under stable conditions. Suppose 
that, when operated repetitively under stable conditions, the internal error 
mechanisms produce a known error pattern — in a particular response as 
measured on a certain scale. And suppose this error pattern in some arbitrary 
units has the form of independent realizations of an error variable e with 
probability element f{e ) de on the real line R 1 . 

Now consider a particular set of conditions for the system. These con- 
ditions determine the characteristics of the response: the spread of the error 
pattern as given by a scale factor applied to the error variable (for numerical 
expression this requires a unit of measurement); and the general level of 
the response as given by a translation of the scaled error (for expression this 
needs an origin of measurement). Let (aq, . . . , x n ) = x' be a sequence of n 
observations on the response, and let a be the unknown scale factor for the 
error and p be the unknown general level of the response. The assumptions 
then give the 

Measurement Model 

n/oon*,. 

i i 

x x = p + cte x , 


x n = p + ae n . 

The model has two parts: an error distribution II f(e { ) II de { which describes 
the variation in the multiple operation of the system (with e’s as variables)', 
and a structural equation x = [p, cr]e (vector notation) in which a realized 
vector e from the error distribution has determined the relation between the 
known observation x and the unknown system characteristics a and p (with 
e as a constant). 

The conditions of the system determine the characteristics a and p; these 
characteristics appear as a transformation [p, a ] which has a positive scaling 
factor cr and a relocation p. Such a transformation is an element of the positive 
affine group 


G — {[a,.c]: — co < a < co, 0<c< co}. 
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Figure 8 The affine group with three subgroups labeled. A group element [a, c] can be 
represented as the point [a, c] or as the transformation carrying i — [0, 1] into the point 
[a, c]. 


The set G is closed under the formation of products and inverses (formulas 
for product and inverse in Section 2). Accordingly G is a group. 

A subset of a group that is itself a group using the same multiplication is 
called a subgroup. The positive affine group is a subgroup of the affine group, 
the “positive half” of the affine group. See Figure 8. 

12 THE ORBITS 

Consider how the positive affine group G affects Euclidean space R n . 
The transformations [a, c] carry a point x into the orbit ofx. 

Gx = {[a, c]x: -co < a < oo, 0 < c < co} 

= {al + cx: —co < a < co, 0 < c < co}. 

The orbit is a half-plane, the half-plane passing through x bordered by the 
extended 1-vector, but not including the extended 1-vector. See Figure . 
For the special case of a point x on the line through the 1-vector, the orbit 
is that line; this orbit with its special form can be excluded from iC with no 
essential loss of generality in the sequel. Two orbits are either identical or 
disjoint: the orbits form a partition of the space R n (see Problem 23). 
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x n 



*2 


Figure 9 The orbit of x under the positive affine group. 

Alternatively, the effect of the group can be examined by considering n 
numbered points x x , . . . , x n on the real line. A transformation [a, c] carries 
these points into the n numbered points x u . . . , x n , where x i = a + cx 
The order of the points is unchanged, and the relative spacings between the 
points are unchanged. Only the location and scaling of the array of points 
are changed. See Figure 10. 

Consider a simple variable to describe the position of a point x on its orbit 
(or the location and scaling of n numbered points on the real line). As an 


\a, c] 



Figure 10 A transformation [a, c] applied to four points. 


24 


Measurement Models 


une 


exploratory choice consider x to measure location, s x to measure scaling, and 
[x, to measure position; 






A transformation [a, c ] carries the point x into the point x — [a, c]x. The 
effect on the variables x, s x is 


x — a + cx. 


which can be written 


s s = cs x , 

[£, s g ] = [a,c][x,s x ]. 


Thus, the transformation carries x into [a, c]x and correspondingly carries 
[x, into [a, c][x, sj. See Figure 11. 


*,1 



*2 

Figure 11 A transformation [a, c] and its effect on the variable [*, sj. 
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The inverse transformation [x, jJ- 1 can be used to reduce a general point 
x to a standard or reference point on its orbit: 

p, s j-'x = (2^2 ..... = d(x). 

\ s x S X J 

The reference point has location d = 0 and scale s d — 1 : 

[d,s d ]= [x,s x )-^,s x ]= [0,1]=/. 

The reference points d(x) index the orbits Gx. 

The general point x can be reconstructed from its position and its orbit: 

x.= [x, ^]d(x). 

The variable [x, sj takes values in the group G. The variable [x, sj, however, 
is used for position rather than for transformation ; accordingly the range of 
\x, s x ] is designated G* to distinguish the special use. 

Now consider in general a variable to describe position on an orbit. 

Definition 3. [6(x), s(x)] is a transformation variable if 

[b([a, c]x), s([a, c]x)] = [a, c][6(x), .s(x)] 

for all x, a, c\ or equivalently if 

b([a, c]x) = a + cb(x), 
j([a, c]x) = cs(x) 

for all x, a, c. (c > 0). 

As examples consider 



where x U) designates the ith smallest value of x x , . . . , x n . 

A transformation variable [6(x), s(x)] leads to a reference point on each 
orbit, the point at which the variable equals the identity: 


d W = [6 ( x), s( x)r'x = ..... 

\ s(x) s(x) ) 

[b{ d), s(d)] = [b(x), s(x)r x [6(x), s(x)] = [0, 1]. 

The reference points d(x) index the orbits Gx. 

The general point x can be reconstructed from position and orbit: 


x=[b(x),s(x)] d(x). 
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Two transformation variables are simply related: On any orbit they differ 
by right multiplication by a group element: 

[6 x (x), Ji(x)] = [b 2 (x), s 2 (x)][ct, c]. 

The group element in general depends on the orbit; see Problem 24. 

Now consider the measurement model 

umu de i>- 

x = [ft, o]e. 

The points x and e are on the same orbit: 

Gx = Ge or d(x) = d(e). 

The positions of the points x and e differ by a transformation a] : : 

[x, = [/G 

The measurement model can then be reexpressed with composite structural 
equation: 

[x, s x ] = [ft, a][e, sj, Gx = Ge. 

Or, with transformation variable [£(x), s(x)], it can be reexpressed as 

n/wn^. 

[b(x), j(x)] = [fi, a][^(e), s(e)], Gx = Ge. 

13 HOMOGENEITY 

Consider a positive affine transformation [a, c] and view the transformation 
as providing new coordinates for given points. The transformation rescales 
by the factor c and then relocates by the amount a: 

x = a + cx. 

The observation vector becomes 

x = [a, c]x. 

The scale characteristic o becomes 5 = co, and the response level ft becomes 
gl — a + cp', accordingly, the system characteristic [ft, o'] becomes 

[ft, 5 ] = [a, c] 0 , a]. 

The transformation does not touch the physical problem being examined; 
it affects only the numerical description of the quantities involved. 

Consider the effect of the transformation on the model. The structural 
equation 


x = [fi, o]e 


§ 14 Reduction 
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x = [ft, 5]e. 

Thus the model 

n/wn 

X = [fi, ff]e, 

in the original coordinates becomes 

TI/( e i) II de t , 

x = [fi, a]e, 

in terms of the new coordinates. The physical problem is untouched by the 
transformation [a, c ] ; the model conforms and has the same form after the 
transformation. The model is homogeneous under the positive affine group. 

14 REDUCTION 

Consider an application of the measurement model 

n/Oi) TI de t , 

[x, s x ] = [p., o][e, s e ], Gx = Ge. 

The error distribution II f{ef) TI de { describes the internal error of the system 
as it affects the response ; it describes the random process that generated the 
realized error e in the structural equation. The structural equatiomgives the 
relation between the known value x and the unknown values fi, a, e. 

Now suppose the system is being examined in isolation, with no outside 
information concerning the unknowns; and consider the information in the 
structural equation concerning the unknowns ft, a, e. See Figure 12. 

The orbit of e is known : Ge = Gx. And the information about the orbit 
is in the form of an event based on the variable Ge for the random process e. 

Now consider the position of e on its orbit. The second part of the structural 
equation can be solved ; it gives 

[e, s e ] = [ft, a]- 1 ^, s x ] = [A,:C][x, s x ]. 

This equation represents the position of e as an unknown transformation 
[A, C ] applied to [x, jJ. If the known position were different, [%, %] = 
[a, c][x, s x ] for example, then the structural equation would give 

[e, s e ]= [fi, o]~ 1 [a, c][x, s x ] = [A, C][x, j B ], 

where [A, C] = [ft, cr] _1 [a, c] is again an unknown transformation (homo- 
geneity of the model). Different values for the position would provide the 
same description for [e, jJ. There is thus no information from the structural 
equation concerning the position of e on its orbit. 
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Figure 12 The known DM X ]; the unknowns [e, s e ], L“> °J- 

In Summary. The error distribution describes the random process that 
generated the unknown e in the structural equation. The only other informa 
tion concerning the unknown e has the form Ge = Gx, an even or e 
random process that generated e. The conditions are fu e or ng 
probability statements concerning unknown constants— exact probability 
statements can be made concerning the unknown error e, they are ase on 
the conditional distribution of the error variable e given t e or it e x. 

15 THE REDUCED MODEL 

Consider the derivation of the conditional distribution of the error variable 
e given the value of the orbit Ge. The standard method used in Section / 
would proceed as follows: The two variables e, s e describing position are 
supplemented by n - 2 variables describing the orbit; the Jacobian to the 
new variables is calculated; the joint density for the new variables is derived, 
the conditional density is obtained by normalizing over the variables e s . 
Consider instead a method based on the transformations t at genera e 
orbits. This alternative method is easier here, even with the added explanation 
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irdej _ irdej 


|^ e |. = ^ L ™ f/e=[A,C\e 

M \is£7/ 


Figure 13 An element V at the reference point d and its images [A, C]V under the group G. 

necessary for its introduction; for more complex problems it is simple and 
direct. 

Consider a neighborhood or element V at the reference point d(x) = d. 
And consider the effect of transformations in G on this element. See Figure 
13. The transformations [A, C ] carry this element point-for-point along 
orbits: Position is changed but orbit is not changed. 

Consider first the effect of transformations hr G as applied to the co- 
ordinates of R n . The transformation 

e = [A, C]e 

is a diagonal transformation, 

§x — A + Ce x , 


e n — A 
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with Jacobian determinant 


de 

de 


= C n ; 


the transformation [A, C] applied to a point e changes Euclidean volume 
in R n by the factor C n . The special transformation [e,s e \ applied to the 
element V at the reference point d changes volume by the factor s”. This 
factor can be used as a compensating factor to produce an adjusted differential 


dm(e ) = 


n de, 

o 71 


an adjusted differential that gives the same volume measure to all the images 
[A,C]V ofV: 


Tide, = C n IT de, = JJ de, 
CX S n e 


Note: A probability element such as II f(e,) II de i is also an adjusted 
differential: the volume II de, of an element at e is adjusted by the factor 
nyCfij) to give the probability associated with that element. For the invariant 
differential s~ n n de { the volume II de, of an element at e is adjusted by the 
factor s~ n to give the volume of the corresponding element at the reference 
point d(e); the construction thus shows that this adjusted differential is the 
unique differential that is invariant under the transformations and agrees with 
Euclidean volume at the reference points d (where s d = 1). 

Consider the effect of transformations in G as applied to coordinates 
describing orbit and position on orbit. The transformations do not affect 
coordinates describing orbit: transformations carry points along orbits. 
Accordingly, any differential in terms of coordinates describing orbit is an 
invariant differential. 

The transformations affect only position on orbit. The transformation 
e = A + Ce ’ 


= Cs e 

is diagonal with Jacobian determinant C 2 ; the transformation [A, C] changes , 
Euclidean area (on G*) by the factor C 2 . The special transformation [e, s e ] : 
applied to the element V at the identity [0, 1] changes area by the factor s 2 . ■ 
(See Figure 14.) This factor can be used as a compensating factor to produce 
an adjusted differential i 


dp, [e, s t ] = 


de ds e 


s 


2 

e 


that gives the same adjusted area to all the images [A, C] V of V. The con- 
struction shows that this adjusted differential is the unique differential that is 
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Figure 14 The volume element V and its images as viewed in terms of coordinates [e, J e ] 
in G*. 

invariant under the transformations and agrees with Euclidean area at the 
identity [0,1]. __ . 

Now consider the two invariant differentials as they apply to the element V 
in R n . Let <5(d) be their ratio at the reference point: 

| I d(>. dc ds 

dm (e) = Lip = 3(d) ^4 = (5(d) dp[e, S J. 
s e s e 

The differentials, however, are unaffected by transformations [A, C]. The 
equality then holds generally, as the element V is transformed along orbits; 
the ratio (5(d) is a differential that measures V at right angles to the orbit. 
This provides the change of variables from a volume element in the original 
coordinates to a volume element in position coordinates conditional on a 
neighborhood of the orbit d(e) = d : 

X 1 do, (5(d) d$e 

s” Sg 

The probability element for e on R n is 

n/fe)n*<=n/feK 
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In terms of the position variable [e, j.]' conditional on a neighborhood of 
the orbit d(e) = d, the probability element becomes 

n /few m = n m ■■wf <« *.■ 

S e 

The conditional probability element for [e, s e ] given d(e) = d is obtained by 
normalization : 

g(e, s e :d) de ds e = k(d) s e ]diK ~ 2 de ds e . 

The constant k( d) is the normalizing constant. 

k~\d)=\ IT Re + s e diK~ 2 dt ds e . 

Jo J- oo 

The conditional distribution described in the preceding section has now 
been derived. The measurement model by its own information content pro- 
duces the 


Reduced Measurement Model 

g(e, s e -.d(x))deds B , 

[%, ^scl = I/®> ^el 

The model has two parts : an error probability distribution g(e, s e : d) de ds e 
(with [ e , s a ] as variable in the upper half-plane G*) which provides probability 
statements for the unknown [e, j.] in the structural equation; and a structural 
equation which gives the relation between the known [x, jJ m G* and the 
unknowns [p, a] in G and [e, J# ] in G* (with [e, sj as a constant). 

Any two position variables on an orbit are related by right multiplication 
by a group element; see Problem 24. The conditional probability element for a 
general position variable has then the form 

gib, s:d)dbds = k{d) Ufi[b, s]d { )s n - 2 db ds, 

where d = d(x) = [6(x), s(x)] _1 x is the reference point on Gx and k(d) is the 
normalizing constant. 

16 EXAMPLES 

As a first example consider the measurement model with standard normal 
error variable: 

n/oo n de i = (27r)- n/2 exp s e 2 } n de 

x = [p, <r]e. 
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The transformation variable [e, sj is convenient for the case of normal error, 
pile conditional distribution of [ e , sj given 


g(e, s e :d) de ds e = /c(d)(27r) ” /2 exp {—4 2 (* + S X) 2 K 2 de ds e 
= k' exp {— 4[ne 2 + (n — l)s 2 ]}s"~ 2 de ds e . 


The simplification in the exponent uses 2^ = 0, 2 <i? = n — 1 id = 0, 
s d = 1)- 

g (e, s g :d) de ds e = k" exp |— de • exp ~ — j(s«) (w ~ 1>/2 ~ 1 ds 2 


de ■ exp 


(s 2 )(m _l,/2-l ds 2 


“(£) exp {-f}^ 

r((n — l)/2)\ 2 ) M 


(n - l)s 2 | d in - l)s 2 


The density factors so that the variables separate; the two factors are of 
normal and chi-square form; the usual normal and chi-square normalizing 
constants are introduced. 

The conditional error distribution has the form : e is normal with mean 0 
and variance 1 /«; (n — l).y^ is chi-square on n — 1 degrees of freedom] e 
and s e are statistically independent. It is of interest that the conditional dis- 
tribution of e, s e does not depend on the value of d; the conditional dis- 
tribution of e, s e is thus the same as the marginal distribution of e, s e . 

A chi-square density function on / degrees of freedom can be manipulated : 


. r(//2) 


2 Hi) 


A// 2-1 2 

-I rf - = 


Af f~ i 

exp 


__ 27 T f/Z 

' r(//2) ’ 

The conditional (or marginal) distribution can then be written 


g(e, s e :d) de ds e 


Qn — l s 6 ) n 2 exp 


in - l)s‘ 
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The density, as a marginal density, is the original density 

exp {-£ 2 e Vi = ylyi* exp (~* ne2 ~ ^ n ~ ^ 

multiplied by the factor 

A^n - 1 s e y~\ 

The factor must give the result of integrating over the (n - 2)-dimensional 
region corresponding to a value for the variable 

(V« e,y/n — 1 s e ), 

the original density being constant on this region. The region is in the (n - 1)- 
dimensional linear subspace corresponding to a value for \Jn e = 2 ej-Jn, 


x n 



X2 


Figure 15 The invariant differential using coordinates in R n and coordinates in the group. 
The area A n _ x of a unit sphere in the (n - l)-dimensional subspace ^ = 0- The vanable 

Vn e measures distance in R n parallel to the 1-vector, and Vn-\ s e measures distance in 
R n orthogonal to the 1- vector. 


0j 
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that is, corresponding to a value of e t + b e n . In this subspace it 

correspond s to a value for ( n — 1)^^ = E — e) 2 . The region is a sphere 
of radius — 1 s e in the {n — l)-dimensionaI linear subspace. See Figure 
15. Thus the factor gives the area of a sphere of radius \J n — \ s e \n n — 1 
dimensions, and A n _ 1 gives the area of a unit sphere in n — 1 dimensions. 
This derivation of the area A f of a unit sphere in / dimensions uses: the 
invariant differentials as derived from Jacobians (involving local properties); 
and the one-dimensional integrations that give the normalizing constants for 
the normal and gamma density functions. No integration is needed in more 
than one dimension. 

The reduced model can now be expressed in the compact form : 


Xn — 1 

■sjn ~ 1 ’ 


x = p + ere, 


The error distribution is described by means of a standard normal variable 
z and a chi-variable % n _ x on n — 1 degrees of freedom. 

Suppose the measurements. are 


62.0, 60.5, 60.7, 61.6. 


The measurement model is 


62.0 = p + oe ± , 

60.5 — /.( + oe 2 , 

60.7 = n + ae 3 , 

61.6 = p + cre 4 . 

The position values are x = 61.2, s x = 0.72; accordingly, the reduced model 


e - = JL s< = i 

V4 V3 

61.2 = p + oe, 

0.72 = as e . 

The error probability distribution with e, s e as variables gives probability 
statements for the unknown error values ( e , s e ) in the structural equation. 
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For example, 


Pr (-1 < e < 1} = 95|%, 


fVO.216 \/9-35’l _ g^o/ 

Pr ( VT ^ s - - "vrl /o ' 

As a second example consider the measurement model with error variable 
uniformly distributed on the interval (0, 1): 

x = [/u, a]e, 

where 

/(g) = 1 0 < e < 1, 


0 otherwise. 


The transformation variable 


[L,R}= [min e { , max e, — min 

leads to a simple form for the conditional distribution, and its choice avoids 
a later change of reference point to gain simplicity. The conditional error 
distribution for [ L , R ] given 

[& 1 R e n R \ 

d “ \ R~~ ’ " ' ’ R J 
is 

g(L, R : d) dL dR = fe(d)/(L + M) ' ' '/( L + Rd n ) Rn ~ 2 dL dR 
= n (« - l)y(L, R)R n ~ 2 dL dR. 

The indicator function cp gives the range of nonzero density. 

<p(L, R) — 1 0<L<T4-2?<1, 

= 0 otherwise. 

See Figure 16, The normalizing constant is obtained by integration: 


<p(L, R)R n -* dL dR = J o Rn 2 dL dR 

= J 1 (R n - 2 - R 71 - 1 ) dR 

= _1 1 _ 1 

n — 1 n n(n — 1 ) 
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Figure 16 The region of positive density for the error position [L, JR], 
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can be compared with the distribution of 

Vif = 

% 3 /V 3 

the /-distribution on three degrees of freedom. See Figure 17 The value 
— 3.33 for Z* is just beyond the 2\% point on the left-hand tail of t e is 
tribution; and it suggests moderately that the hypothesis is not true. 

Now consider in general the measurement model 

k(d) TI /([*, SJW 2 d * ds e . 

[x, s x ] = [p, a][e, sj. 

Suppose that an outside source has prescribed the value /x 0 for p. The 
hypothesis p = p 0 gives 

x = p 0 + <ye, 
s x = os„ 

and hence produces the value 

e oe x — p 0 

s e os e S x 

for the error characteristic /. This value for / can be compared with the dis- 
tribution of / derived from the error probability distribution, g(e, s e : d) de ds e . 
the joint probability element for t and s e is 

g(ts e ,s e :d)s e dt ds e ; 
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the marginal element for / is then 


g L (t:d) dt 


g(ts„ s,:d)s, ds e ■ dt 


P CO 

= fc ( d ) J o Uf([ts e , d Se • dt 

= k( d) -J^ n/( S e( f + ds e ■ dt. 

The hypothesis can be assessed by comparing the calculated value of t with 
the distribution of values described by gx,(/:d) dt. 

Now suppose, alternatively, that an outside source has indicated the value 
o 0 for a. The hypothesis 0 = cr 0 leads to the value 


for the error variable s e . This value for s e can be compared with the distri- 
bution for s e as obtained from the error probability distribution : 


gsOvd) ds e = k(d) ■ XI /([*. *'¥{) de • s” 2 ds e ; 

J- 00 

and the hypothesis can be assessed accordingly. 

18 GENERAL INFERENCE 

Consider the measurement model and the information it contains concerning 
the unknown physical characteristic [p, a]. For the numerical example 
in Section 16 it has the form 


~V4’ 6 V 3 ’ 

61.2 = p + oe. 


with error probability distribution that describes the unknown [e, and 
with structural equation that links the unknowns [e, sj and [p, a]. 

Each possible value for [ e , s e ] corresponds to a possible value for [p, a]: 

[61.2,0.72] = [p, o][e,s e ]. 

A probability statement concerning [e, is ipso facto a probability statement 
concerning [p , a]. The probability distribution that describes the unknown 
[e, thus gives a distribution, the structural distribution, describing the 
unknown [p, o]: 

[ft <t] = [61.2, 0.72]f-S= . ^T 1 , 

LV 4 sfi J 
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and by using the error probability distribution to describe [e, j e ] : 



0 = y/n-lS x - Xn-f 


The variables p, a describing the structural distribution are statistically 
dependent, each variable involving % n _ x . 

Now consider the general model 

mUKie, sMK ” 2 deds e , 

[x, s x ] = [p, a][e, *J. 

The error probability distribution gives probability statements for the un- 
known [ e , jJ. The structural equation links the values for [e, in one-to-one 
correspondence with the values for [p, a]: 

[*, s x ] = [p, a] [e, j,] or [e, jJ = [p, jJ, 

X — u s x 

e — , s e = — , 

a a 

-1 _ (« - A*) 

d(e, s 0 ) _ a 0-2 _ £5 

d(p, <*) _ s* (T 3 ' 

a 2 

The distribution describing the unknown [<?, .vj gives the following structural 
distribution describing the unknown [p, <r]: 

g*(p, a :x) dp da = fc( d) ~ dp dee. 


NOTES AND REFERENCES 

The error variable of a stable system has been used as the basic ingredient 
to develop a statistical model and a method of inference. Other approaches 
to statistics examine the exterior of a system, and use the classical model 
of statistics: a possible input value 6 denoting the physical quantity; an 
output value x denoting the observation; and a probability density f(x:0) 
describing the frequency behavior of output values x for any input 6. The 
classical model effectively treats the system as a black box, a model that 
describes external behavior characteristics and ignores any internal operation 
or mechanisms. The other approaches need a variety of principles and 
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reduction techniques to obtain solutions to problems of inference. The 
approach here with a more comprehensive model obtains unique solutions 
without additional principles and techniques; the solutions are in terms of 
classical frequency-based probability. 

The origins for the approach here lie somewhere between two diverse 
approaches to statistics, that of R. A. Fisher in England and that of the 
mathematical-statistical schools in North America. Both approaches use the 
classical model. The Fisher approach perhaps more frequently introduced 
methods stimulated by the peculiarities of the applications being studied. 

Classical models can be derived from the models studied here: the location 
model /(x - 0) is obtained from the simple measurement model, and the 
location-scale model o~ l f((x — /f)lc r) from the measurement model. Fisher 
examined these classical models in 1934 and proposed the use of a conditional 
distribution given a configuration of a sample. These conditional models are 
the classical-model analogs of the reduced models in this chapter. 

Fisher (1930, 1935) also proposed distributions to describe the unknown 
value of a physical quantity. His proposal was in the framework of the classi- 
cal model; his derivations violated generally accepted rules for handling 
probabilities and models. 

Fisher R.A. (1930), Inverse probability, Proc. Cambridge Phil. Soc., 26, 528-535. Also as 
Paper 22, Fisher (1950). 

Fisher R. A. (1934), Two new properties of mathematical likelihood, Proc. Royal Soc. 

(London), A144, 285-307. Also as Paper 24, Fisher (1950). 

Fisher R. A. (1935), The fiducial argument in statistical inference, Ann. Eugenics, 6, 391-398. 
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Research, Hafner Publishing Co., New York. 
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PROBLEMS 

1. A strength measurement on batches of steel castings has error variation that is approxi- 
mately normal with mean 0 and standard deviation 2.5 (units of 1000 psi). For a particular 
run of castings let 0 be the general strength level. A random sample of 10 castings yielded 

59.5, 61.5, 63.5, 63.0, 64.5, 

61.5, 60.0, 65.0, 59.5, 57.0. 

(i) Obtain the reduced model. 

(ii) Make central 95% and 99% probability statements for the error position. 

(iii) Derive the structural distribution for 0 ; make 95 % and 99 % probability statements 
for 0. 
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2. For the second example in Section 8 derive the structural distribution for 0 ; sketch the 
structural distribution. 

3. Consider the simple measurement model with error variable uniformly distributed on 
the interval (—0.5, 0.5). 

(i) For the measurements aq = 157.01, x 2 = 157.99 derive the reduced model; derive 
the structural distribution for 0. Note. The conditional distribution can be obtained from 
the general formula or by geometrical argument from the uniform distribution of (e v e.,) 
over the square (—0.5, 0.5) x (—0.5, 0.5). 

(ii) For the measurements 157.01, 157.99, 157.68, 157.92, 157.48 derive the reduced 
model; derive the structural distribution for 0. Do the additional three measurements add 
information concerning 0? 

(iii) Test the hypothesis that 0 = 157.60. 

4. Consider the simple measurement model with normal error in Section 8. Use the location 
variable r(x) = min x i with d i — x i — min x it and show that the conditional error distri- 
bution for min e i is normal with mean — d and variance o^n. Check to see if this is equivalent 
to the conditional error distribution in Section 8. 

5. Consider the simple measurement model with component error density 

f(e) = exp {— e), e > 0, 

= 0, e < 0. 

(i) Derive the reduced model using r(x) = min x i as location variable; determine the 
normalizing constant. 

(ii) Derive the structural distribution for 0. 

*6. Consider the simple measurement model with double exponential component error: 

f(e) = 4 exp {— je|}. 

Notation. Let x (i) be the z'th smallest of (aq x n ); then x {1) = min x { , x {n) — 

max x i ; each x {i) is a location variable. 

(i) Derive the reduced model using x (1) as location variable; determine the normalizing 
constant (integration can be performed interval by interval on the real line). 

(ii) Derive the structural distribution for 0. 

(iii) For the measurements 5.8, 6.5, 6.8 sketch the reduced error distribution; sketch the 
structural distribution. 

7. Consider the card-dealing example in Section 5. Find -Pr {2 spades} with the additional 
information that the second participant observed both cards and would not have differen- 
tially reported the special case of two spades. 

8. Show that the positive affine transformations form a group: 

G = {[a, c] : — oo < a < co, 0 < c < oo}, 

the positive affine group or location-scale group. 

9. For the numerical example at the beginning of Section 1 7 test the hypothesis that a — 0.3. 

10. A method of measuring temperature remotely has an error variable that is approximately 
standard normal. For a particular sequence of seven determinations (°C), 

683, 688, 683, 687, 692, 687, 682, 
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let p be the temperature level and a be the error scaling. 

I ESS 

f0 (iv) Make central 90% probability statements for /* and for a. 

11. Test the hypothesis: p = 680°C in the preceding example 

12. Consider the measurement model with component error d.stnbut.on 

/(e) = exp {-<?}, e > 0, 

= 0 , e < 0 . 

, r . "i r_ V" • l(ti — 1) — i|,i] as transform- 
er Derive the reduced model using [b, s] = [ x a) , 2 j 2 a; u)/'-" (D J 

a « SSTd- distribution of the error characteristic , - *. Sketch the structura, 

distribution of p. 

13 Consider the measurement model with error uniformly distributed on the interva , ). 

(i) Derive the reduced model using lb, s] = l%„ »«,, ~ “ ‘rnnsformatioa vanable. 

(ii) Derive the distribution of the error characteristic t b\s. 

(iii) Derive the structural distribution for [/.<, o\. 

*14. Consider the measurement model with double exponential component error 

fie ) = \ exp 

(i) Derive the reduced model using \[b,s] = the conditional 

(ii) For the measurements 5.8, 6.5, 6.8 sKetcn sever 
distribution: for example, the section * = 5 0 and the section 0 . 

(iii) Derive the distribution of the error characteristic t- bis. 

(iv) Sketch the structural distribution for the response level p. 

*15. The general Weibull distribution is 

y)^ 1 exp } dt > l> y ’ 

^ n x R oo —co < y < °°- Consider the measurement model with a 

with 0 < a < co, 0 < p < °°, y _ • v 

Weibull component error distribution (p > 0 given) . 

/W*-**- 1 “P (-«'>*■ ‘>°- y-Whimcy.) 

(i) Derive the reduced model. 

(ii) Derive the structural distribution for [p, °J- 

*16. The general Weibull distribution can be specialized: 

dt = £ exp {/? In / - exp {|3(ln t - In a)}} d In t, t > 0. 
a " 



exp 

aP 
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Consider the measurement model with a Weibull component error distribution 


f{e) de = exp {e — exp {+e}} de. (J. Whitney.) 

(i) Derive the reduced model. 

(ii) Derive the structural distribution for [p, cr], 

(iii) Derive the structural distribution for a. 

*17. Consider a first measurement model with error distribution f x {e), observations 
(aq, . . . , x m ), and quantities [p x , of. And consider a second measurement model with 
error distribution / 2 (e), observations (y 1 , . . . , y n ), and quantities [p 2 , <q]. 

(i) Derive an integral expression for the structural distribution of /q — p 2 . 

(ii) Now suppose that f x {e),f 2 {e) are standard normal. Show that the structural distribution 
of p x — p z can be represented in the form 

/“i — /*2 ~ ^ ~ + r (h cos ® + ( 2 s ‘ n ®)» 


and q and / 2 are /-variables on m — 1 and n — 1 degrees of freedom, respectively. Tables 
for the distribution of a linear combination q cos 0 + q sin 6 of /-variables have been 
prepared by Sukhatme and are tabulated in Fisher and Yates (1949). (Fisher, 1935.) 

18. Show that the rescalings 

G — {[0, c]: 0 < c < co} 

form a group, the scale group. 

19. Let e be an error variable with distribution /(<?) de on the real line jR 1 ; let 0 be a quantity 
taking positive values; and let aq, . . . ,x n be measurements with a multiplicative error. 
The multiplicative measurement model is 

n/wn *<. 

x = [0, 0]e = fie, 

where [0, 0] is an element of the scale group. 

(i) Determine the orbit of x under the scale group (delete the origin 0). 

(ii) Define a scale variable s(x). Show that ( Y a:f)V6 is a scale variable. 

(iii) Show that the orbits can be indexed by d(x) = (aq/.y(x), . . . , xjs(x))'. 

(iv) Derive the reduced model 


k(d)s n rifisdj ■ j , 
s(x) — Qs, 


where j = s(e) designates error position. 
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(v) Derive the structural distribution for 0 : 

uaJ^Ttt 


(Vi) Suppose that f(e) = 0 for -co < e ^ 0: Show that a logarithmic transformation 
applied to the variables in this problem transforms the multiplicative measurement model 
into the simple measurement model. 

20 . Consider the multiplicative measurement model with standard normal error. Show that 
the structural distribution for d is 


,(¥M 


1 ) dd 

215 Hip 


use (V x\)W as scale variable. 

21. Consider the multiplicative measurement model with error distribution uniform on the 
interval (0, 1). 

(i) Derive the reduced model using max x t — s(x) as scale variable. 

(ii) Derive the structural distribution for 6. 

22. A radioactive source emitting particles at the average rate of one per unit time interval 
gives the distribution 

f(e) de — exp {— e) de, e > 0, 

= 0, e <0, 

for the time interval c between successive emissions. Let 0 be the corresponding time interval 
o a source under investigation, and let * n be „ independent measurements of time 

interval between successive emissions. This is the multiplicative measurement model 

X = [0, 0]e. 

(i) Describe the orbits and reference points; use x as scale variable. 

(ii) Derive the conditional error distribution g(e:d) de. 

(iii) Derive the structural distribution for 0. 

23. For the positive affine group show that two orbits are either identical or disjoint, 

24. (i) Show that the following variables are transformation variables tor the positive 
affine group: 

[b, s] = [x v |* 2 - * x ll, 

.max J; 


"(max x i 


for the variable [b, s] the exceptional set must be increased from the extended 1-vector to 
the (it — l)-dimensional subspace described by x 2 — x x — 0. 

(ii) For the two transformation variables M and [M, R] determme the connecting 

transformation [a, c], 

[M, R] = [*, sja, c :], 

and show that [a, c ] is constant-valued on any orbit. 
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(iii) Let [f> x (x), s x (x)] and tf> 2 (x), s 2 (x)] be two transformation variables and d x (x) and 
d 2 (x) the corresponding reference points. Show that 

[0 1 (x), v 1 (x)] = [6 2 (x), s 2 (x)][a, c], 

[a, c]d 1 (x) = d 2 (x), 

where [a, c] depends only on the orbit, is constant-valued on any orbit. Draw a diagram 
of an orbit as a half-plane and indicate the significance of the equations. 

*25. Consider the measurement model with position variable [b, s] = [aq, |* 2 — * x |] and 
reference point 

d * . ( x i~ b *Y . 


(i) Use the method of Section 7 to derive the conditional error distribution. 

(ii) Find the form of the conditional error distribution for the normal example at the 
beginning of Section 16; simplify by using the alternate coordinates [<?, s e ] on the orbit 
(see Problem 24). 

26. (i) The positive affine transformations [a, c] can be reexpressed in terms of matrices. 
Show that the set 

c ,ri 0| -co <» < 

a cj 0 < c < coj 

with matrix multiplication as the operation forms a group having effectively the same 
multiplication rule as the positive affine group. 

(ii) Check that the measurement model can be reexpressed in terms of matrices and 
matrix multiplication : 


"1 •• 

• 1 " 

ii 

* 

• i " 


e n 

1*1 ’ 

' x nJ 


f{E)dE = t[f(e i )Jlde i , 

l 

X = 9E. 

27. The positive affine transformations [a, c] form a group 

{ —'oo < a <_co) - 

[a, cl: 

0 < c < ooj 

with the multiplication rule 

[a, c][a *, c*] = [a + ca*, cc*]. 

Let A be a p x r matrix of real numbers, and C be a p X p matrix of real numbers with 
. positive determinant. Show that the elements [A, C ] form a group (the generalized positive 
affine group or the regression positive-linear group) 

G = {[A, C]}: 

the multiplication rule is 

[A, C][A*, C*] = [A + CA*, CC*], 
and the identity is [0, /] where / is the p x p identity matrix. 
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28. The generalized affine transformations [A, C] can be reexpressed in terms of (p + r) x 
{p + r ) matrices. Show that the set 


: l c l>° 


1 


0 

0 

... o 

0 


1 

0 

... o 

A 

C 


with matrix multiplication as the operation forms a group havtng effectively the same 
multiplication rule as the generalized positive affine group. 

29. A set {A } of subsets A of a space X is a partition if (a) any two subsets A a , A p ( a ^ ft) 

are disjoint : nA p =0, where 0 is the empty set; (b) the union of all the sets A a is the 

space X: (J a /4 a = 3C. 

(i) Show that the orbits in Section 3 form a partition of R n . 

(ti) Show that the orbits in Section 12 form a partition of R n (extended 1-vector delete ). 

30. Show that a set G of one-to-one transformations of a set X onto itself is a group if 
and only if it is closed under the formation of products and inverses : if g x ,g 2 are in G, then 
g x g 2 is in G\ if# is in G, then^ -1 is in G. 
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In the development of the measurement model the internal error of a system 
was recognized as the primary entity. The error variable was introduced to 
describe the response effect of the internal error; it is the basic ingredient of 
the model. 

The measurement model, however, corresponds to a rather special kind 
of system, a system with all controllable variables held constant and with a 
single real-valued response. This chapter introduces a general model, the 
structural model. The structural model corresponds to a general system 
with internal error, and it has an error variable to describe the response 
effect of the internal error. 

The development of the structural model follows very closely the pattern 
in Chapter One. The two measurement models were analyzed there in a form 
that would best illustrate the general methods and concepts of this chapter. 
Some of these methods and concepts are trivial for the simple measurement 
model; all are nontrivial for the measurement model. The structural model 
is developed without further illustrations; some simple extensions of the 
measurement model are introduced in the Problems. 

1 THE MODEL 

Consider a system operating under stable- conditions. Suppose that 
experience with such a. system using appropriate measurement scales has 
led to the identification of a response component of the internal error. Let this 
be described by an error variable E having a fixed distribution on the space 
X of the response. 

Suppose the general characteristics of the system are given by a quantity 6, 
a transformation belonging to a group G of transformations of X onto X. 
To avoid triviality the group G is assumed to be unitary on X: 

Definition 1. A group G of transformations on X is unitary if g x x = g % x 
for any x implies g x = g 2 . 
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A group is unitary if there is at most one transformation carrying any point 
into any other point. The quantity 6 applied to a value E of the error variable 
gives a value X for the response : X — dE. 

This description of the system produces the 


Structural Model 


E, 

X=6E. 


The structural model has two parts: an error variable E with a known dis- 
tribution on the space X ; and a structural equation X = dE in which a realized 
value E from the error variable gives the relation between the known response 
X in X and the unknown quantity 6 in the group G of transformations on X. 

Consider how the group G affects the space X. The transformations g 
in G carry a point X into the orbit of X : 


GX = {gX: geG}. 

(see Figure 1 .) 

Suppose two points are related by a transformation: 

X 1 = hX 2 , X 2 = h~ x X x . 


Then any point generated from one can be generated from the other . 
gX x = (gh)X 2 , gX 2 = (gh~fX x , 

and the orbit of X x is the same as the orbit of X 2 . It follows then that any two 
orbits are either identical or disjoint and that the orbits partition the space 
X (Definition in Problem 29, Chapter One). 



Figure 1 Orbit, reference point, and transformation variable. 
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Figure 2 Comparing two transformation variables and the corresponding reference points. 

Consider a variable to describe position on an orbit: 

Definition 2. A function [A] defined on X and taking values in G is a 
transformation variable if 

IgX] = g[X\ 

for all g in G and X in X. 

A transformation variable [X] leads to a reference point D{X) on each 
orbit (the point at which the variable equals the identity) : 

D(x) = [xy i x, 

m = m-TO = 

A transformation variable can always be defined by choosing a reference point 
D(X) on each orbit GX and letting [X] be the unique element in G that carries 
the reference point into X : 

X = [X]D(X). 

The reference points index the orbits, and the transformation variable gives 
position on an orbit. 

Two transformation variables are simply relatedone to the other. Along any 
orbit they differ by right multiplication by a group element: 

X= [X] x DfX), X= [X\DfX), 

[A]. = [X] x [D x (X)} 2 , 

[A] x = [X] 2 [DfX)] x . 

(See Figure 2.) 

Consider the structural model again : 


E, 

X = QE, 
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and let [X] be a transformation variable. The points X and E are on the same 
orbit: 

GX = GE or D(X) = D(E). 

The positions of the points X and E differ by a transformation d : 

[X] = 6[E ]. 

The structural model can then be rewritten with composite structural 
equation: 

E, 

| [X ] = 6[E], GX = GE. 

The quantity 0 is an element of the group G. The positions [£] and [X] are 
also elements of the group but designated G* to distinguish the use for 
position as opposed to transformation. 

The structural model can be written alternatively 

E, 

[X] = d[E], D{X) = D(E). 

Consider a transformation g and view it as providing new coordinates for 
given entities : 

. X = gX, d=gd. 

The structural model in the original coordinates is 

E, 

X=QE. 

Multiplying the structural equation by g gives X = 6E. The structural model 
then becomes 

E, 

X = 6E. 

The model thus has the same form in the new coordinates X, 6 as in the 
original coordinates X, 6: the model is homogeneous under the group G. 

2 THE REDUCED MODEL 

Consider an application of the structural model 

E, 

[X] = 6[E], GX = GE, 

to a system under stable conditions. The error variable E describes the 
response component of the internal error; it describes the essentials of the 


$3 

random process that generated the realized E in the structural equation. The 
structural equation gives the relation between the known X and the unknowns 
6 and E. 

Now suppose the system is being examined in isolation with no outside 
information concerning the unknowns; and consider the information in 
the structural equation concerning the unknowns Q and E. 

The orbit of E is known : GE = GX. And the information about the orbit 
is in the form of an event based on the variable GE for the random process E. 
The position of E on its orbit, however, is not known: 

[E] = e-'[x] = g[X]\ 

the structural equation represents [£] as an unknown transformation g 
applied to the known [X], If the known position were different, [A] = h[X\ for 
example, then the description of [ E ] would be 

[E] = Q~'h[X ] = g[X], 

where g is again an unknown transformation in G. Different values for position 
would give the same description of [£]. There is thus no information in the 
structural equation concerning the position of E on its orbit. 

The error distribution describes the random process that generated the 
unknown E in the structural equation. The only other information concerning 
the unknown E has the form GE = GX, an event for the random process 
that generated E. The conditions are fulfilled for making probability'state- 
ments concerning unknown constants — exact probability statements can be 
made concerning the unknown error E\ they are based on the conditional 
distribution of the error variable E, given the orbit GE — GX. The structural 
model by its own information content produces the 

Reduced Structural Model 

[E] : GE = GX, 
m = d[E]. 

The reduced model has two parts : an error probability distribution (for the 
variable [E] given GE — GX) which provides probability statements for the 
unknown position [ E ] in the structural equation; and a structural equation 
which gives the relation between the known [A] in G* and the unknowns 6 
in G and [ E ] in G* (with [E\ as a constant). (See Figure 3.) 

3 INVARIANT DIFFERENTIALS 

In Chapter One the conditional error distribution for the measurement 
model was derived by means of invariant differentials. An element about an 
initial reference point was transformed along orbits; its volume was measured 
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Figure 3 The orbit of X and £; the known [X] in G*, the unknowns 6 in G and [E] in G*. 


invariantly in terms of original coordinates and invariantly in terms of orbit 
and position coordinates; the equivalence of the two invariant measures 
gave the change of differential from original coordinates to position co- 
ordinates given orbit; the probability element (in terms of position co 
ordinates given orbit) was normalized over the range of the position 
coordinates. This procedure is now used for the structural model. 

Consider a unitary group G of one-to-one transformations of X onto X 

and suppose the following assumption holds ; 

Assumption 3. f X is an open set in Euclidean space R N ;G is an bpen 
set in R L ; the transformations 

g = hg, X = hgX 

are continuously differentiable with respect to g, h, and Jf; and [X] is a 
continuous transformation variable on X. _ 

Consider a neighborhood or element V at a reference point D{X) — 
[X]-'X; and consider the effect of transformations in G on this element (see 


t The methods and results remain valid if X and G are Euclidean manifolds ; they also remain 
valid if X and G are topological spaces provided derivatives are replaced by appropria 
Nikodym derivatives relative to a given measure on X. 
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Figure 5 The invariant differential that coincides with dX at the reference point D. 


= dm(X). 


a measure of volume that is constant under any transformation h in G\ 

dm(hX) = = J ^ h:X ) dK ^ = = dm(X). 

K } MhX ) J N (h:X)J N (X) J N (X) x A 

The construction shows that it is the unique invariant differential that coin- 
cides with Euclidean volume at the reference point D(X). (See Figure 5.) 

As an example consider the positive affine group with transformation 
variable [x, (Section 12, Chapter One): 

d[A, C)x = C n dx, 

J n ([A, C]:x)= C n , 

= c 


Consider now the effect of transformations in G as applied to coordinates 
describing orbit and position on orbit. The transformations do not affect 
coordinates describing orbit: transformations carry points along orbits. 
Accordingly, any differential in terms of coordinates describing orbit is an 
invariant differential. 
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The transformations affect only position on orbit. The transformation h 
applied to the position [X] changes Euclidean volume (in R L ), 


by the positive-Jacobian factor 

J L (h:lX])= ^ . 

The particular transformation [X] applied to the reference value i changes 
volume by the factor 

J L aX)) = J L ([X]:i), 

which can be used as a compensating factor to produce an invariant 
differential 

d/u([X]) = -ML 

a measure of volume that is constant under any transformation in G: 


* - *<m). 

Jr,([hX]) J iff 1 ■ [X])J ff[X\) 

The construction shows that it is the unique invariant differential that 
coincides with Euclidean volume at the identity i. (See Figure 6.) And the 
method of construction shows that any other invariant differential differs 
only by a constant of proportionality, the constant being the ratio of the 
differentials at the identity. 

As an example consider further the positive affine group : 

d[A, C][x, s x ] = C 2 d[x, s x ], 

Jf[A, C ] : [x, s x ]) = C 2 , 

U[*,S x ]) = sl, _ ; 

J , r _ , N d[x, s x ] dx ds x 

Mi*, sj) = ^ ~ • 


M*]| M-i = 


Jdhm) 


Figure 6 The invariant differential on G* that coincides with d[X] at the identity i. 
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Now consider the two invariant differentials as they apply to the element 
V and to images of V under transformations in G. At the reference point 
D = D(X), let 8(D) be the ratio: 


dm(X) 


J L ([X]) 


8(D) dp([X]). 


The differentials, however, are invariant under transformations in G. The 
equality then holds throughout the orbit; the ratio 8(D) is a differential that 
measures V at right angles to the orbit.| This provides the change of variables 
from a volume element in the original coordinates to a volume element in 
terms of position coordinates conditional on a neighborhood of the orbit 
D(X)=D: 

dm(X) = 8(D) d/i([X]) 


dX = 8(D) iVV — d[X]. 

J L m) 

As an example consider further the positive affine group : 

dm(x) — <5(d) dju[x, sj, 
dx = 8(d)s”~ 2 dx ds x . 

The factor 6(d) gives a measure of area on the sphere s x = 1 in the subspace 
Hx t = 0; compare with Figure 15 in Chapter One. The variable *Jnx 
measures Euclidean distance in the direction of the 1-vector; the variable 
V« — 1 s x measures Euclidean distance orthogonal to the 1-vector (radially 
from the 1-vector): 6(d)/V« V« — 1 measures Euclidean area on the sphere 
s x = 1 (with radius n — 1) in the subspace S x,- = 0. 

4 THE ERROR PROBABILITY DISTRIBUTION 

The conditional error density can now be derived. Assume that Assumption 
3 in Section 3 holds and that the variable E has a density functional?) with 
respect to Euclidean volume on X. 

The probability element for E is 

f(E)dE=f(E)J N (E)dm(E) 

= f(E)dm(E); 

(The differential 8(D) can be written 8(D) = (5 1 (D)6 2 where: (a) <5 2 dy([X\) at the 
identity gives L-dimensional Euclidean volume along orbit as calculated using the coordin- 
ates of R N ; and (b) 8 L (D) measures Euclidean volume at D in the (N — L)-dimensional 
space orthogonal to the orbit. 
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the modified f(E) is a density with respect to the invariant differential dm(E). 
The probability element can be expressed in terms of orbit D = [Ef 1 E 
and position [E] by using results from the preceding section: 

m rv\ r*\ J N (E) 

A f r?l 


f([E]D) 8(D) 


j L m ) 


The conditional probability element is then obtained by normalization: 


g([E):D)d[E] = k(D)f([E]D) 


J l ([e ]) 

= g([E]:D)dp([E ]) 

= k(D)f([E]D) dfjt([E}), 


k-\D) = ( f([E]D) 


' Jo-' J 'j L m) v y 

The reduced structural model can now be given as 
Reduced Structural Model 

g([E]:D(X))d[E], 

[X] = Q[E]. 

The model has two parts: an error probability distribution g([E] : D) d[E] 
([E] is a variable on G*) which provides probability statements for the 
unknown [E] in the structural equation ; and a structural equation which gives 
the relation between the known [X] in G* and the unknowns 0 in G and [E] 
in G* ([£] is a constant). 

Some distributions connected with the conditional error distribution need 
further results concerning invariant differentials. In Section 3 the transforma- 
tion 

g = hg 

involving left multiplication by the group element h was examined. And the 
invariant differential, more correctly- the left invariant differential, was 
derived : 

dfj,(g) = . 

^ J L (g) 

If the variable g is used as a position variable on an orbit, then the trans- 
formation g = hg can be viewed as coming from a transformation X = 
hX on the space X, and it gives position of a new point relative to a fixed 
reference point (Figure 7). 

Now consider the transformation 

g ~ gh 
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Figure 7 A left transformation :g = hg. The same volume measure at the new point as at 
the old point. 

involving right multiplication by the group element h. If g is being used as a 
position variable on an orbit, then the “transformation” f = gh can be viewed 
as a change in reference point from. D to h 1 D (see Figure 8). The right 
“transformation” h changes Euclidean volume, 

dgh = dg, 

dg 

by the positive- Jacobian factor 

' j*tu. \ d % h 

Jiih'.g) = — • 

dg 

A composite “transformation” hfi % changes volume by the factor 

Jt(hiK:g) = Jtihp.ghJJlihpg). 

The particular “transformation” g applied to the reference value i changes 
volume by the factor 

Jt(g) = jl(g- 0 , 

which can be used as a compensating factor to produce the right invariant dif- 
ferential , 

dv(g) = ~pA- , 

J L (g) 
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The left and right invariant differentials have some simple interconnecting 
properties. With g as variable the differential 

dfijg) = Mggo) 

is left invariant: 

dpjhg) = djxihggo) = dfi(ggo) = dp g fg). 

Accordingly it must differ from the left invariant differential W by a con- 
stant of proportionality, the ratio of volume measures at the identity: 

d/i(gg„) = Afeo) Mg)- 

The proportionality factor A (g) is called the modular function of the group. 
It has the following properties: 

A(i) = 1, 

Aferfa) = A(g x ) A (g 2 ), 

A(g- X ) = A- X (g). 

The factor A(e„) measures a change under a right transformation; it can 
therefore be used as a compensating factor to construct a right invariant 

differential 

*‘(1!) = ^ = A- 1( g)tW g )- 

The differential agrees with Euclidean volume at the identity; hence 

dVl A rfefah^ariant differential can also be constructed by assigning to a 
differential change at g the left invariant measure of the corresponding element 

^ dvfg) = d/iOr 1 )- 

This element is right invariant: 

dvfgh) = d/j+h^g- 1 ) = d/xig- 1 ) = dv 2 (g). 

At the identity i it agrees with Euclidean volume: Consider an element Fat i; 
symmetrize the element, 

V s = {g: geF or g _1 eF}; 

the a measure and ,, measure of V are equal; hence *,(g) = Mg)- 
The invariant differentials can be interrelated . 

d//(g) = A(g) dv(g) = dvig- 1 ), 
dv{g ) = A _1 (g) <Mg) = dn{g~ x ). 


General Inference 


The modular function has the form 

**>-$0 AO 

J L (g) 

For the positive affine group 

A«a, c] ) = rSfeiLP- 


ACg- 1 ) 


J 2 ([a, c]) c 2 
da dc 1 da dc 


1\ 1 da dc 


d/u([a, c]) 


dv([a, c]) = 


5 GENERAL INFERENCE 

The structural model by its own information content produces the reduced 
model 

[£]: GE = GX, 

m = 6[Ei 

Consider the information in this model concerning the unknown quantity 6. 

The value of [A] in the structural equation is known. Each possible value 
for the unknown [£] corresponds to a possible value for 6: 

e=[X][EY\ [£] = Q-^X}. 

A probability statement concerning [£] is ipso facto a probability statement 
concerning 6. The probability distribution describing the unknown [E] thus 
gives a distribution, the structural distribution, that describes the unknown 6. 
The structural distribution is obtained from the error probability distribution 
of [£] by the map 

6 = ME]- 1 

from [ E ] in G* to 6 in G. 

Now suppose that Assumption 3 in Section 3 holds and that £ has a density 
/(£) with respect to Euclidean volume. The reduced model is 

g ( [£] : D{X)) dp { [£]) = k(D ) /( [£]£>) J N (E) dp([E]) 

m = 6 [El 

The structural distribution is obtained by substituting: 

[£] = s-Mn 

dp([E]) = A([X])dp(d-') 

= A {[X])dv{d) 

= A(0^[A]) dp{6). 
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The structural probability element for 6 on the space G is 

g*(d :X)dd = g(r x [X] : D) A(r x [X]) dp( 6 ) 

= /c(D)/( 0 - 1 X)J iY ( 0 _ 1 X) A(r x [X]) dp( 6 ) 

, J?(r x [X]) 

= k([x] X x)f(d x)J N (d x) j^ Q -i [X]) Ji(0) 


/c([Z]- 1 X)/(0“ 1 A)7 iV (0 x X) 


j L m)j L (0) 


As an example, the structural probability element for the measurement 
model can be obtained by substitution: 


n [ £ V* 

g*((/t, <j]:x) djU da = /<(d) II /([a> c] x a; i )i^-j 


sjcr dp da 


mfim. or'^l-T 1 

i w / 


sA" 1 dp da 


6 TESTS OF SIGNIFICANCE 

Consider the two tests of significance for the measurement model in 
Section 17 in Chapter One. The first test concerned the hypothesis p - p Q 
and was based on the value of the error quantity ejs e . The possible hypotheses 
of the form p = p 0 produce a partition of G; and the possible values for e 
error quantity e/s e produce a partition of G * (see Figure 9). 

The second test concerned the hypothesis a = a 0 and was based onj e 
value of the error quantity The possible hypotheses of the form <r - <r„ 
produce a partition of G; and the possible values of s e produce a partition 

of G* (see Figure 10). ■ . 

Now consider the structural model and suppose that the space G is 
partitioned into disjoint sets ; let HID) be the set containing 0 (see Figure 11). 

Consider the hypothesis 10) = H This hypothesis combined with the 
structural equation [X] = 0[E] or [E] = 6~0 0 gives the informal, on that 
the unknown [£] is in the set 

. (r x [X]: H(0) = H 0 ) = {9~ x [X]: 6 e H o } 

= HoW 

(note that H _1 = {g" x : g e H) is the set of inverses of elements of H). 
The sets H — H(6) form a partition of G; the corresponding sets H L 1 
form a partition P on G*, a reflection about [X] of the partition on G (see 

F The information concerning [E] is that [E] is in a set H^[X], a set in the 
partition P of G* into components H~'[X]. Consider what the information 



Figure 9 The hypothesis p = /i . 0 and a partition of G. The value of the error characteristic 
e/s e and a partition of G*. 



concerning [H] would be if [X] were different. If [X] were different, g[X] 
for example, then the information would be that [E] is in the set H~ x g[X]. 
If the presentation of information has the form of events, then the sets 
H~ l g[X] (various g ) must all be sets in the initial partition P = of 

G*. This implies 

Hflg[X] = H“ x [X], for some H, 

Ho X g = H~ x , for some H, 
g~ x H 0 = H, for some H. 

Thus the left multiples gH 0 of H 0 must all be sets in the partition (H(0)}. By 
group theory it follows that the partition {H(6)} consists of left cosets gH of a 
subgroup H of G. See Problems 12 and 13. 

For the measurement model example, the hypothesis p — p Q can be 
expressed as 

[p, a] e |> 0 , 1]H 2 , 
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a left coset of the scale group H 2 . Show that the hypothesis a = o 0 can also be 
put in left coset form. 

Suppose now that some outside source has indicated that 6 is in the left 
coset g 0 H of a subgroup H of G. The hypothesis 0 e g Q H gives information 
concerning the unknown error [£] in the structural equation : The error [£] 
is in the set 

e e gJI} = (goHTW 

= H-'gfiX] 

= Hgf[X), 

which is a right coset or orbit on G*. (Note that H^ 1 = H since H is a group.) 
Thus the hypothesis d e g 0 H (a left coset on G) gives the information that 
[E] is on the orbit Hg~ x [X] (a right coset on G*). This information has the 
form 

H[E] = Hgf[X}; 

it is an event for the error variable and it uses the orbital variable H[E] on 
G* or HE on X. 

Let t([E]) be a variable that indexes the H orbits on G*. The hypothesis 
6 e g 0 H together with the structural equation leads to the value 

t{[E]) = t(go X [X]). 

This value can be compared with the distribution of the variable /([£]) 
derived from the error probability distribution; and the hypothesis can be 
assessed accordingly. 

. For the measurement model example the information that e[s e is equal to 

x — [J, Q 

s* 

is equivalent to the information that [ e , ^ 6 ] is on the orbit 
1] l [x> ^ 2 ] = Hz[x ‘i’aL 
a right coset of the scale group H 2 (see Figure 9). 

*7 CONDITIONING BY OUTSIDE INFORMATION 

The measurement model with normal error is a simple example to illustrate 
conditioning. In reduced form the model is 



[x, s x ] = i >, o][e, s e ] 
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(Section 16 in Chapter One). The corresponding structural distribution is 


11 % S x 5 

s„ 


a — $x ’ 

s„ 



where [e, aj has the error probability distribution (Section 18 in Chapter | 

“oppose the information <r = u„ becomes available ™ s 1 

the form of an event for the error variable (Sect.on 6); it leads to the 


for the error characteristic 


As an event it can be used to condition the error distribution; it gives 


which then gives 


z\Jn _ 

^ 1 sjoo V" 


S x —7~ = a o 
sjo o 


for the structural 

the information a — o 0 is the same a 
simple measurement model 


x = p + e 

in Sections 8, 10 of Chapter One. , introduced. 2 
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Consider the structural model in reduced form: 

[E]:GE = GX, 

[X] = 0[E\, 

and suppose there is information concerning 6 in the form of an event. 
By Section 6 the information has the form 6 e g 0 H t , where H x is a subgroup 
of G. 

In the typical case involving a subgroup H 1 in a group G there is a comple- 
mentary subgroup H 2 such that each element g in G can be written uniquely 
as a product: 

g = kh, 

where k is in H 2 and h is in H x . This kind of decomposition! for G leads to 
convenient notation; it is examined here in lieu of the general case. Suppose 
G can be expressed in this manner, g = kh, and let 

[g] = k, [g] = h, 

2 1 

g = [g][g]- 
2 1 

The inverse of an element g is 

g _1 = [g]“ 1 [g]~ 1 - 
1 2 

An inverse can be any element in G ; accordingly the decomposition of G can 
be made in the reverse order : 

g = [g][g] 

1 2 

and 

[g- 1 ] = [g]-\ [g- 1 ] = [g]" 1 - 

1 1 2 2 

(See Figure 12.) 

The quantity Q in G can be represented in terms of elements of and H L : 

6 = r<p, r = [0], 9 p = [0]. 

2 1 

The information 6 e goEf x can then be given as r = r 0 , where r 0 = [g 0 ] and 

2 

go#! = [gollgoWi = T 0 H v 

(See Figure 13.) 

t G is called the semidirect product of the- subgroups H.. and if,. 
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Figure 12 G expressed as a semidirect product of H 2 and H 1 and as a semidirect product 
of Hi and H 2 . 


The information t = t 0 gives information concerning the unknown error 
position [£]. The structural equation in the form [E] — 6 -1 [X] can be written 


and then separated 


The information r — r 0 determines the H 1 orbit of [E] on G* (also the Hi 


Figure 13 The information 6 E r 0 H 1 or, equivalently, 


mm 


§7 


Conditioning by Outside Information 


71 



Figure 14 The orbit of [£] under H 1 as determined by the information r = t 0 . 


orbit of E on 3C); the orbit can be designated alternatively by the reference 
point 

[£] -1 [£] = [£] = [r^X] 

1 2 2 

on G*. The information also produces a restricted structural equation 
describing position on that orbit: 

[E] = q>- x [r^X], [r^X] = <p[E}. 

riii 

(See Figure 14.) 

The error distribution conditional on its H x orbit can be expressed as 
[£]: H x [E] = Hi[t^X], GE = GX, 

i 

or equivalently as 

[£]: H 1 E = H 1 To'X. 

i 

The conditioned model then has the form ; 

Conditioned Model 

[£]: H^^HrfX, 

i 

[rfX] = <p[E}. 

l l 

Alternatively, suppose the information r = r 0 had been available when the 
model was being constructed. The error distribution would be 

E. 
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And the structural equation would be |jjj 

X = r 0 cpE | 

or -I 

Y = r?X = cpE. | 

This would give the structural model J 

E, 

Y = (pE 

based on a quantity cp in the group H x . The reduced model would be 

[£]: H X E = HiT, 

1 

[Y] = ?>[£]. 

i r 

This is the same as the conditioned model in the preceding paragraph 

*8 MARGINAL AND CONDITIONAL DISTRIBUTIONS 
The test of significance in Section 6 required a marginal distribu ti 
of an error variable. The use of outside information in ection P r0 
conditioned distribution of the error variable. Marginal and conditio 
distributions can be derived quite generally. They provi e a , 

of the error distribution and a corresponding decomposition of the structural 

distribution. , 

Consider the general structural model with error density /( ) ’ 

suppose that Assumption 3 (Section 3) holds and that t e quan l y 
can be factored uniquely : 

0 = rep 

where r and tp are in subgroups H z and H x , respectively. 

Let [X] be a transformation variable with invariant differentia 


dm(X) = 


JnW 


on X. Let dpi(g), dv(g) with A (g) be invariant differentials on G, 
dv x (h) with A,( h) be invariant differentials on H x (an open set m L x dime ), 

and dpL 2 (k), dv 2 {k) with A 2 (/c) be invariant differentials on H z (an open 
£ 2 dimensions). 

The adjusted differential 

dptjhk) 

A(fc) 
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is invariant under multiplication on the left by an element of H x and on the 
right by an element of H 2 . The composite differential 


has the same invariance property. Let 8 be their ratio at the identity i then 

d fl ,(hk) = 8 d/L h (h) du.Jk) = 8 d/x^h) A(Ar) dv 2 (k). 

The substitution h — hr 1 , k — > k~ 1 leads to a parallel relation: 

dv(kh) = 8 dv x (h) ~~ dv 2 (k) = 8 dv x (h ) A ~\k) dju 2 (k). 

^2 (*0 

The probability element for the error variable [£] can be expressed in 
terms of the components [£] and [£]: 

1 2 

K[£] :£)«£]) 

= k(D)f([E]D) dft([E]) 

A([£]) 

= KD)A[E][E]D) 8 dpt x ([E]) — &- dM[E]) 

1 2 1 A 2 ([£]) 2 


k(D)8 

2 2 

The last expression is written as a product of the conditional distribution of 
[£] given [£] and the marginal distribution of [£]; the constant k x ([E]D) 
1 2 2 2 
normalizes the conditional distribution. 

The structural probability element for 0 = rep can similarly be expressed 
in terms of the components <p in H x and t in H 2 : 

g*(d:X) dd = k(D)/(6~ L X) A([A]) dv(6) 

= k{D)f{<p^r-^X) A([T]) <5 dv x (tp) A“i(r) d H { r). 

The restricted structural equation for tp and [£] is 


[£] = tp l [r l X\ 
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and the corresponding differential is jag 

d^m) = dVi(cp)] |j 

the remainder of the structural equation is M 

[E ] = [t' 1 *]. 

The normalizing constant of the conditional error distribution can J) 

used : :d m 

g*(d:X) d6 = Xx{[r dv 'iW 


This expresses the 


M D)A d/Lu(r). 

k.a^XjD) Ax([t x X1) 

2 1 

structural distribution as a product of the conditional 


11119 - . r 

distribution of <p given r, and the marginal distribution of . 
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found recurrently through the statistical itera ure, a _ ^ contrast, their 

classical model and usually as sa device ° ^ P nt of the model, 
use here is primary and essential . the g Hktributions and of 

The analysis of conditional and marginal but ons a 

factorizations of the invariant measures has developed in interchang 
A. Kalotay, H. Levenbach, and J. Whitney. ^ 

Fraser D. A. S. (1961), The fiducial method and 1 ' 9 ' 

Fraser D. A. S. (1966), Structural probability an a g > statistics, 

Fraser D. A. S. (1967), Data transformations and the linear model, * 

38, 1456-1465. . , nrthoeonal group, Ann. Math. 

James A. T. (1954), Normal multivariate analysis and the or g g V 

Statistics, 25, 40-75. >j PW York. 

Lehmann E. L. (1959), Testing Statistica ypo y, ton University. 

PeisakofT M. (1951), Transformation parameters, Ph.D. thesis, rr 
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7 <j The Structural Model 

using the notation of Problem 27, Chapter One. Give the formula for 


3. ( Continuation ). Consider the preceding group acting on R~ n - 


(i) Show that the group is unitary on R 2n . 

(ii) Determine the form of the orbits GX. Note: X can be viewe as apoin U’ in' 
x n , i» 2n ) in R 2n or as an ordered pair of points, (* u , .... *i„)» C* 2 i> • • • » x ^- h 

(iii) Show that 

r i o (H 


[X) = i o 

x 2 0 1 J 

is a transformation variable. Give three other examples of transformation variab 

(iv) Show that 

n 

dm(X ) = XT (dx i i dx *^ 

i = 1 ■' 

(v) Give the formula for 

rr a ."i n oYi f ' x m~) 


4. Consider the scale 


acting on R 2n : 


[f 1 

0 

0 " 

1 0 

C 1 

0 

Ho 

0 

c 2 


0 < q < co | 

0 < c, < co 
* i 
J 

" 1 1 
■■g a n ••• *i» 

v_ ^21 ‘ 


(i) Show that G z is an Abelian group. y _ o' or 

(ii) Show that the group is unitary on R 2n ; omit points X having bn. • • • > i« 

. . . , x 2n ) = O'. 

(iii) Determine the form of the orbits G^X; see the Note in ro em 

(iv) Show that 

rio o t ri o 0 

m= o o - o #) o 

.0 0 • (Zx&fiJ U 0 s£X)J 


is a transformation variable. 
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Give the formula for 

t°i r 1 °nr ^ 

loJ’U iJJU/ 

7 ( Continuation ). Consider the group in Problem 6 acting on R . 

r i ••• i "i r i ••• 1 1 


(i) Show that the group is unitary on R Zn ; omit points with O u , . . . , *i«) 0 • 

(ii) Show that 

r 1 0 ol 


m = o i o 
.0 r(J) l 


where f(X) = S * 14 * a ,/S is a transformation variable. 

(iii) Show that 

= n dx 2i ). 

(iv) Give the formula for 


(p 

0 

0 " 

0 

Cl 

0 

llo 

fc 

c 2 - 


LL° J L /c 1 ••• *! 

8. Consider the progression group (scale and shear group) 


0 < Cj < oo 
— oo < k < oo 


acting on R Zn (X as in problem 7): 

X = gX. 

(i) Show that G i is a group. . t 4 

(ii) For n >2 show that G 4 is unitary on R Zn ; omit points having (* X1 , - • • > hJ and 

(x „,, . . . , x 2n ) linearly dependent. . . - „ n 

(iii) Describe an orbit GJT; for convenience represent X as an ordered pair of points m . 

(iv) Show that ■ _ 

n o o ^ 


[ X ] = o 5 1 (X) 0 

^0 t(X) |S(2)(^) 


is a transformation variable, where s^X) — (S 

,(*) = , , m (*) - [2 (** - t(X)x u KX)f^. 

s 1 {X) 
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(v) Show that 


^(X>f 2) (Z) ’ 

* r \ ^’l dk dc% dc-t dk dco 

“'"W = -73~ • *« - 

<-X<-2 c l c 2 


A(<g) == — - 
c 2 

(vi) Show that can be expressed alternatively as 

T°] f Ci 0 11 - 

-loJ’U c 2 JJ’ 

check the multiplication. Give the formula for 


0 < Cj- < co 

— CO < k < oo 

— co < a < co 


9. Consider the location-progression group 

( 1 0 (T 0 < Cj < <xA 

a x c 4 0 : — co < k < oo 1 

- a 2 k c 2 J — CO < Oj < to J 

acting on i? 2n (jT as in problem 7): 

(i) Show that G 5 is a group. 

(ii) For n > 3 show that G 5 is unitary on 7? 2n ; omit points with (1, . . . , 1), . . , x ln ), 

( x z \, .... x 2 n ) linearly dependent. 

(iii) Show that 

"10 0 " 

[X] = Si SjttX) 0 

^ ®2 *(-^0 J (2)('^)_, 

is a transformation variable with 

hW = [2 (*ii - *i) 2 ]H 
/(Z) = ~ ~ ^2) 

’ . 

J (2)W = t2 (*K - *2 - t(Xj@ u - xJ/s^X))*] 1 ^ 

(iv) Show that 

nc^ 

'1 (-*>&)(■*’) 

. , dan dc-% dk den 

dp(g) = 2 , 

C 1 C 2 

, , „ r/a, <7c, dk dc 9 
— • 

C 1 C 2 

AW = i 


(iv) Show that 


dm(X) = 
dp(g) = 
dv(g) = 

A( <? ) = 
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(v) Show that g can be expressed alternatively as 

fc, Oil 


Give the formula for 


’I k C a JJl® 2 l x 2« 


10. Consider the positive linear group 

f f 1 0 0 " 

Go = I 0 c xl c 12 '■ 


acting on R 2n ( X as in Problem 7) : 


X = gX. 


(i) Show that G 6 is a group, a non-Abelian group. . , n -Cl 

(ii) For n >2 show that G 6 is unitary on R n \ omit the points X a i g ( n» • • • > m 4 

and (x n , .... x 2n ) linearly dependent. ■ 

(iii) Show that j 

dc xx dc 2X dc l2 dc 22 j 


dc xx dcy> dc 21 dc 22 

Ms) ~ tJF ' 

'Sfgiml. 

(iv) Show that an invariant differential is 

XT (dx u dx 2i ) 
dm{X)=— 

*(v) Develop a transformation variable. 

11. Consider the positive affine group (location positive-linear group 

(Cl 0 0 X 4 

c n . X 12 >ol 

G 7 = C 11 C 12 • C >U 

j C 21 C 22 1 

V L a 2 C 21 c 22 -7 ' 

acting on R 2n ( X as in Problem 7) : 

X = gX. 

(i) Show that G_ is a group. . 

(ii) For n > 3 show that G 7 is unitary on R n ; omit poin s . 
(x u , . . . , x ln ), (x 2V x 2n ) linearly dependent. 

(iii) Show that 

da-, da 0 dc „ dc 2X dc 12 dc 22 

‘W-'-hjp ’ 

da x dc xx dc 12 da 2 dc 2l dc 22 

*W = [Jjs '• 

Afe) 


X having (1, . . . , 1)» 
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(iv) Show that an invariant differential is 

— n (dx^dx,^) 

{X) \xxW ' 

*(v) Develop a transformation variable. 

12. Let H be a subgroup of a group G. 

(i) Show that the sets {gH: ge G} form a partition of G, the left cosets of the subgroup 
H {Example: The first part of Figure 9; G is the positive affine group, and H is the scale 
group). Similarly, show that the sets {Hg: geG} form a partition of G, the right cosets of 
H ( Example : The second part of Figure 9). The set-forming braces are used in the free 
sense: the set of distinct entities gH formed as g takes values in G (the set gffl = g 2 H with 
gi gi occurs once in {gH: geH }). 

(ii) H is a normal subgroup in G if gH = Hg for all g in G. Show that the partition into 
left cosets is the same as the partition into right cosets if and only if H is normal in G. 
{Example: Figure 10; G is the positive affine group and H is the location group.) 

*13. Consider a group G and a partition {H a } of G; suppose that the partition {H a } is 
closed under left multiplication by any element in G (i.e., for any g and H a there is a set Hp 
in the partition such that^JT,, = Hf). Show that one of the sets H a is a subgroup H of G 
and that the partition is by left cosets of H. 

14. Consider a partition {gH} of a group G into cosets with respect to a normal sub- 
group H. 

(i) Show that a natural multiplication of cosets is defined by 

(g L H)(g 2 H) =g 1 g 2 H. 

(ii) Show that the multiplication rule for cosets satisfies the axioms of a group. This 
group defined on the cosets is the factor group G\H of G by the normal subgroup H. 

15. Consider the notation for semidirect products on p. 69 in Section 7. Show that 

[^r 1 = ir'i igr 1 = t*- 1 ]. 

11 2 2 

16. (i) Show that the location group H\ is a normal subgroup of the positive affine group 
G (Section 1 1 in Chapter One and Figure 8 in Chapter One). 

(ii) Show that the factor group Gjlfi can be represented by the scale group H 2 (Figure 
8 in Chapter One). 

(iii) Show that 

[a, c] = [a, c][a, c] = [a, 1][0, c\ 


Note that 


[a, c][a, c] = [0, c][c~ l a, 1]. 
2 1 

{a, c] = [0, c ], 

[a, c] = [0, c], 

2 


a consequence of normality of H x (see Figure 12 with G, H x , H 2 as defined here). 

*17. Consider the example at the beginning of Section 6. Use the notation of Section 7 
and the results in Problem 1 6. 
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(i) Examine the hypothesis /J “ ^“° r ”*^(1 e 

of H 2 on G* can be indexed by t([e, s e D “ Le ’ ^ v ’ i 

one-to-one correspondence with H z [e, s e ]). Determine the value of 

f([p 0 . 

Compare with Section 17 in Chapter One- (Figure 10). Show that the orbits of 

(ii) Examine hypothesis 0 ^ ®t> or ^ ^ ]^j e° shovTthaUe, s B ] is in one-to-one corre- 

H x on G* can be indexed by t{[e, sj) - L e > s 2 

spondence with H x [e, s e ]). Determine the value of 

/([0, OoHtc, sj). 

Compare with Section 17 in Chapter normal subgroup of the location- 

18. (i) Show that the location group G x (Problem i) is a norm S Y 

progression group G 5 (Problem 9). represented by the progression group G 4 

(ii) Show that the factor group G^\G X can be represent y ? & 

(Problem 8). 

(iii) Show that 

"1 0 



tel 

4 


tel 

4 


0 c x 0 
0 k C 2 

n o <n 

0 c x 0 
0 k c. 


a consequence of normality of H x . 
(iv) Show that 


tel = 


tel 


l o o’) 

*i 1° 

0 1 


1 

1, 


0 0 
1 0 


c% "1 

c^kc^ai + c-r lfl 2 ® * 

(v) Check the preceding components using the alternative notation: 

- - 0 vt rrtn r c , o')! 


a i 


K C 2 

ffi 0 
k c 2 

1 0T| 
0 1 

«i 
«2 


i oYi 
o 1 
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19. (i) Show that the location group G x (Problem 2) is a normal subgroup of the positive 
affine group G 7 (Problem 11). 

(ii) Show that the factor group G 7 /Gi can be represented by the positive linear group G fi 
(Problem 10). 

(iii) Show that 


tel 


tel = 


o o 


t 12 
c 22-^ 
0 ' 
C 12 


a consequence of the normality of H x . 

(iv) Show that 

tel = 


1 0 0 
2 X 1 0 

*2 0 1. 

1 


0 0 


kS 

II 

C 1 ^! + 

c n a 2 

1 0 


_ c 21 a x + 

c 22 a 2 

0 1 _ 

r c “ 


f c li 

c 12 ^j _1 

[c 21 

c 22 J ~ 

L C 21 

c 22 J 


where 


is an inverse matrix. 

(v) Check the decompositions using the alternative notation: 


1 0 
0 1 

<-ii c 12 
C 21 c 22 


0 
0^ 

"’ll 

Lk. <- 21 


<-11 

C 21 


■ [: a 


*20. If H x is normal in G and H x ci H 2 <= G, where H 2 is a subgroup of G, show that H x is 
normal in H z . Compare Problems 18 and 19 and check Gpe G s <= <j 7 . 

*21. (i) For the measurement model with normal error determine the conditional error 
probability distribution and the conditional structural distribution given the information 
fi = fx Q (see Figure 9). 

(ii) The structural model constructed with the information ji = fi 0 is 

n /( e ,o n de it 

Vi = x i - Po ” ae i 


ko 





S4 The Structural Model wo . - 

wh«re *. error dis.ribu.ion is thal of a standard «£*«£££ £££ “ ' J ‘ 
meat model with multiplicative error (Problem .19 m Chapter une;. _ »j 

probability distribution and the structural distri ution or 

(iii) Compare the distributions for a - f the aua ntity 0 may have -v^ 

n 2 . In a tare application of a tarucooal in^Uhe «>“^ e d i“bn,L 
occurred as a realized value from a ran P . fj f r0 m the structural model 

Primary interest would center als0 L interest in the information f. 

itself. In certain circumstances, however, mere migu 

from the combined processes. The composite model is £ 

p(d) dv(0), 

/( E)dm(E ), i 

X=0E. 

The model has a distribution 

structural equation; it has a distribution ati „„ linking the known X and 

value in the structural equation, and it has H 

the unknowns d and E. 

(i) Consider the contours c ° rres P™ d ‘£ g 

Show that this information is based on the partition 

G(6, E ) = {(Qf\ g E ) ■ 8 e G} ' 

(ii) Show that [£] is a transformation variable for these orbits; show that ([X], D(X)) is 
the reference point; and show that X indexes t e or i s. 

(ii ' l)ShOWthat dv(0)dm(E) 

is an invariant differential. . information is 

(iv) Show that the conditional distribution given available information 

k*(X)/{[E]D)p({X][Er 1 ) dp([E\) 

i„ terms of [E] or is A([fl) <M« 

quantity 0 in a group G. 

(i) Check that the composite model, 

A( E)f i {F)dEdF, 

(X, Y) = 8{E, F ), 

is a structural model. „ f ^ th _ ,-nmnosite model, 

(ii) Show that the structural distribution for 0 from the composite mo 

g*(d : X, Y) dv(6), 

can be obtained from the joint distribution of 0 - A fhe ^ rst ™ invariant differ- 
the second model by imposing the condition 0 t = 0 2 relative g 

ential. 


CHAPTER THREE 


Linear Models 


The measurement model was developed to describe a system with all con- 
trollable variables held constant : the response variable was real valued ; the 
internal error as it expressed itself in the response was distributed with known 
form. 

In this chapter two structural models are developed as different extensions 
of the measurement model. The regression model handles a broad class of 
systems in which the controllable variables are allowed to vary or are manip- 
ulated. The progression model provides an extension in a different direction 
and handles a rather special kind of system with vector-valued response 
variable, special in being progressively structured in terms of error compo- 
nents. The range of applications of the second model is limited, but it supplies 
some of the notation and method to be used for a more comprehensive 
model that is developed in later chapters. 

The regression model and the progression model can be combined in a 
single general composite model. A succession of problems presents this 
extension. 

THE REGRESSION MODEL 

1 EXAMPLES 

Consider a stable system having a real-valued response. Suppose that 
selected controllable variables are subject to manipulation and that the 
response component of internal error has a known distribution f(e ) de on 
R 1 . Suppose also that twelve performances of the system have been made and 
Vi, ■ ■ ■ , Vi 2 are the observed values of the response variable.! 

1.1 If the controllable variables do not affect the response level, then the 
measurement model is applicable. Let p designate the general response level 

f In the presence of controllable variables a response variable is typically designated by y 
and a controllable variable by x. 
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, , in ree 

gfi Linear Models 

and a designate the response sealing of error. The ' q “rix“rm" in- 

written in the special notation of Chapter One or in th ™ 
dicated by Problem 26 in Chapter One; the matrix form 
for generalization here. The structural equation is 

r i i i i i i i i 1 1 1 1 


Vi 2/2 2/3 2/4 2/5 2/6 2/7 2/8 2/9 2/10 2/n 2/12 J 

fi o'! fi 1 1 1 1 1 1 > 1 1 1 1 


oJU «, «, a. a, a, a, a 10 a n a 12 

The positive affine transformations 

ff 1 — co < a < co] 


c J 0 < c < coj 

form a group under matrix multiplication. The orbit of a f = ^imem 

the two-dimensional subspace 

/ - 00 < a x < co) 

+ ' _ a< at<a> ] 

but consists of points with positive coefficient for y. 

/ - 00 < a < oo| , 

L+(l] y) = + c y ; 0<c< 4 

suppose that nine of the performances were mhosen at random 
and given a certain treatment and that the remaining three ’ 
treatment; designate those with no treatment y , > ' 

treatment by 4, . . , 12. Let ft designate the response level with no treat_ 

ment, and /3 2 designate the increase in level from n 
treatment. The structural equation can be expresse as 

r i i i i i i i i 1 1 1 1 

0001111 11 1 1 1 
2/i 2/2 2/3 2/4 2/5 2/s 2/7 2/s 2/9 2/io 2/ii 2/12 J 

r 1 0 01 r 1 1 1 1 1 1 1 1 1 1 1 1 


' 1 

0 

°1 

0 

1 

0 


(h 

a J 


000 1 1 1 1 111 

e, e 2 c 3 e 4 e 5 e 6 e 7 e 9 «« 


1 1 
6xi e 12 
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Figure 1 The subtending one-dimensional subspace Z.(l) ; The orbit G • y = £+(l; y), 
a positive half of the two-dimensional subspace L{ 1, y). 


An additional row has been adjoined to the error and response vectors to 
permit the continued use of matrix multiplication. 

The transformations 



form a group under matrix multiplication : 




1 0 0]" 1 f 1 

0 10= 0 



a 


1 


0 

1 


0 

0 


0 0 1 


a x a, c 


— c -1 a 2 c~ x 



Three 
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This is a simple generalization of the positive affine group , it is a regress' 
scale group. The orbit of a point y = ( 2 / 1 , • • > 2 / 12 ) 1S the set 


G • y = a x v x + <z 2 v 2 + cy: 



wherej 


v x = (ljljljljljljijiji’ 1,1, i/j 

v 2 = (0,0,0 5 1, 1, 1,1, 1,1,1. !> *)'• 


(See Figure 2.) The orbit is a half three-space ; the orbit is contained m the 
three-dimensional subspace 

L(v x , v 2 , y) = [a 1 y 1 + a z y z + a 3 y: -co < a u < co) 



Figure 2 The subtending two-dimensional subspace L(v l , v 2 ). The orbtt G L{ i> » - • : 

a positive half of the three-dimensional subspace L(\ v v 2 , y). 

f The elements of G are transformations on i? 1 *; the ^.^'^“^^^^ultiplicatioQ 
on y in J0* A transformation as represented by a matrix can act by matrix m P 

provided the additional vectors are adjoined to y. 
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but consists of the half with positive coefficient for y : 

r ,, « ( -co<a u <co 

L+ (.Vi> y) = { a 1 y 1 + a 2 \ 2 + cy: 

{ 0 < c < oo 

The orbit is subtended by the two-dimensional subspace 

L(y x , v 2 ) = {a 1 y 1 + a 2 y 2 ] 

and consists of the positive translates 

L(y i, v 2 ) + cy 

of that two-dimensional subspace. Note that points y in the subspace 
L(y i, v 2 ) are implicitly excluded ; compare the measurement model, Section 12, 
Chapter One. 

The characteristics of the process can be described in various ways. As 
alternative quantities describing the process, let a x designate the average 
response level corresponding to the twelve performances, and a 2 designate the 
increase from the average level for no treatment to the average level with 


treatment. The structural equation 

can 

then be expressed 

as 





" 1 

1 

1 1 

1 

1 

1 1 

1 

1 

1 

1 ' 




— I 


1 -1 i 

i 

i 

i i 

i 

1 

4 

i 

1 - 
4 




. yi 

Vz y 3 Vi 

2/5 

2/6 

2/7 2/8 

2/9 

2/io 

Vn 

2/12 _ 



"1 

0 

(T 

" 1 1 

r 

1 

1 1 

1 

1 

1 

l 

1 1 " 


0 

1 

0 

3 3 

4 4 

3 

4 

i 

i i 

i 

l 

4 

i 

i 

4 

i t 


. a i 

a 2 

a 

„ e i e z 

*3 

*4 

*5 *6 

*7 

*s 

*9 

*10 

-11 *12 _ 


The general level corresponding to the first three performances is 

a i — |a 2 = /3 X , — 

and corresponding to the remaining nine is 

K 1 + 4 a 2 — ft. + fiz- 


The orbit of a point y is of course the same as before. The orbit is 

■oo <a u < col 
0 < c < ooj 

£ + O x , w 2 ; y}. 


G- y - a x w x + a 2 w 2 + cy: 





Figure 3 The subtending two-dimensional subspace L(Wr>w 2 ) L^v' *a)- 

w 2 , y). 


w x = (1, 1, 1, 1, 1 , f , 1) ^ ^ 1 ^ ^ ’ 

W 2 = (-J, -}, -t, b h b b h h bb i)’- 

The basis v, for the subtending space has been replaced by the new basis 
w. Wo (see Figure 3). Note that w 2 is orthogonal to w x ( Vi)- 

The new matrix expression for the observation vector y can be expressed 

in terms of the old: 



1 0 0 

-| 1 0 

0 0 1 




Examples 


in brief, the change of basis for the subtending subspace is 


Lv 2 J L P 21 1- 

The new structural equation can be related to the old: 


1 0 0 




1 0 0 


1 0 0 
0 1 0 



1 0 oV 1 


0 0 1 


0 0 1 
1 0 0 


0 0 1 
1 0 0 

-l 1 0 
0 0 1 


1 0 0 

0 1 0 


1*1 *2 crj J 

And the new quantity in terms of the old can then be extracted: 

f 1 0 0] r 1 0 o¥i 0 o¥. 1 0 oy* 


1 0 0 


0 0 1 



0 1 0 


100 1 0 


0 0 ijla & JL 001 


(*i» * 2 ) ~ (0i, 0a) 


(0i, 0a) 



(0i,0 2 )i J - 1 =(0 1 ,0 2 )f 1 °1 

Ip 21 1 J 
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Note that a 2 = 0 2 - The alternative structural equation with its orthogonal 
basis vectors w, w 2 has some advantages for later analysis. 

1 3 Now suppose that three of the nine treatment performances were 
chosen at random and given a level 0 of a variable * integral l to > th ^eatae 
that three of the remaining six performances were chosen at random and 
"e level . - 1. and the remaining three were grven the level . = 2. 
Let the performances be numbered 4, 5, 6 for level x 0, , , 

^ff the additional variable x affects the response level linearly with coefficient 
0 3j then the structural equation can be expressed as 

" i t l 1 1 1 1 1 1 1 1 1 

o o o 1 1 i 1 1 1 1 1 1 

000000111 2 2 2 
y x 2/ 2 2/3 2/4 2/s 2/s 2 h 2/a 2/9 2/io 2/n 2/i2_^ 

n o 0 oYi 1 1 1 1 1 1 1 1 1 1 r 

0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 


0010 0 00000111 2 2 2 
(5i 0 2 03 a gl 62 e 3 * 4 65 e& 6l 6 9 6X0 Sn 6 12 


The transformations 


10 0 0 
0 10 0 
0 0 10 
a x a 2 o 3 c 


— oo < a u < co 


0 < c < co 


form a group under matrix multiplication, an example of a regression 
group. The orbit of a point y = (2/i» • • • > 2 / 12 ) 1S tne 

1 — 00 < a u < coj 

G • y = UiVi + + «3^3 + cy: 0 < c < oo] 


ion-scale 


= L + (v l5 v 2 , v 3 ; y), 
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Figure 4 The three-dimensional subspace L(y v v 2 , v 3 ) which subtends the half four-space 
G • y = L + (v x , v 2 , v 3 ; y). 

where 

v 3 = (0,0, 0, 0, 0,0, 1, 1, 1,2, 2, 2)'. 

(See Figure 4.) The orbit is a half four-space ; it is contained in the four- 
dimensional suhspace L(v x , v 2 , v 3 , y) but consists of the half with positive 
coefficient for y. The orbit is subtended by the three-dimensional subspace 

L(y u v 2 , v 3 ) = {a l y 1 + a 2 y 2 + a 3 v 3 }, 

and it consists of the positive translates 

L{y 1 , v 2 , v 3 ) + cy 

of that three-dimensional subspace. 

As alternative quantities describing the process, let cq designate the 
average response level corresponding to the twelve performances, let a 2 
designate the increase from the average level for no-treatment to the average 


m t 
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level with-trcatment, and let «. be the coefficient for chan S e of ^leveljffi 
respect to the variable x. The structural equation can then be expressed 

I 1 1 1 1 1 1 1 


1 1 


"I 

0 

2/i 


0 

2/2 


1 

3. 
" 4 

0 

2/3 


1 

i 

4 

-1 

Vi 


-1 

2/5 


-1 

2/6 


i 

0 

2/7 


1 

1 

4 

0 

2/8 


¥ 

0 

2/9 


1 

2/io 


1 

i 

1 

2/ii 


1 

4 

1 

2/12 


1000 
0 .1 0 0 
0010 

% a 2 a .3 
111 

f -t -I 
0 0 

e 2 e-. 



1 


1 

1 

4 

-1 


1 


1 


1 


1 


■1 
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The orbit of a point y is of course the same as before. The orbit is 

G • y = a x Wi + a 2 W 2 + fl 3 w 3 + cy • 

= L + ( w 1} w 2 ; w 3 ; y), 

w 3 = (0,0,0, -1, -1, -1,0, 0, 0, 1, 1, 1)'. 


— 00 < a u <. 00 
0 < c < co 


where 


vpf 


The basis v v, for the subtending subspace has been replaced by the' new 
basis w„ wi: (see Figure 5). Note that w, and w,(- tJ are mutually 

° r ?hTn n ew matrix expression containing the observation vector y can be 
expressed in terms of the old . 
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Hvi, v 2 , v 3 ) 

= L(w lf w 2 , w 3 ) 




Figure 5 The subspace L(w v w 2 , w 3 ) which subtends the orbit G ■ y == L+( Wj) w w - y \ 
The orthogonalized vector w 3 is obtained from v 3 by subtracting components »..,v p \ 
or equivalently by subtracting components p 3 l y/ lt p™ w 2 (note p 3t = p™). I n the example 
w 3 = v 3 ~ v 2 ! the diagram illustrates the more general case in which w 3 is formed from v 
by removing both v r and v 2 -components. 3 


in brief, the change of basis for the subtending subspace is 


Wi 

w 2 

W 3 

y' 


1 

vi 


V 2 


V3 


y 


0 

0 

1^ 

1 

~PZ1 

-PSL 


Vi u 
v 2 

.vL 

0 0 
1 0 
p32 1 . 


The new structural equation can be related to the old equation and the new 
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quantity in terms of the old quantity can then be extracted . 


10 0 0 
0 10 0 
0 0 10 
a x a;, a 3 a 


0 0 


0 0 


10 0 0 
0 10 0 
0 0 10 


1 01 02 


0 o 
0 o 


0 o 


or, in brief, 


0 0 


= (01, 02, 0s) l 1 0 


1 0 o 


1 0 0 


0*1, a 2, <* 3 ) = (01, 02, 03) 


= (01,02,03 )P- X = (01, 02> 03) P' 1 1 0 ' I 

Note that a, - ft,. The alternative structural equation with . its orthogom.1 1 
basis vectors w l5 w 2 , w 3 is convenient for some analysis in later 

2 THE MODEL 

Consider again a stable system with a real-vaiued res^n^S^pose 
that selected controllable variables are subject to manipu . , 

response component of internal error has a known ^tnbuhnn^on R . 

Also suppose that there have been n performances of th y ^ 

observations on the response variable are y — (Vi, • • ■ • > ” ' informa-.') 

that various controllable variables have been mampu a e onse i eV elP 

•ion concerning the system presents the ^ 

as linear in structural vectors v x , .--, v r- nil able variables 

values of treatment indicator-variables, or values of 
or values of combinations of these variables; compare with the 
Section 1. As quantities characteristic of the system e P 

scaling of error, and let ft, .... 0, be the coefficients tha - F n . 

levels in terms of the structural vectors v ls . . . , V T ? 
performances can then be described by the 
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limiide,, 
1 1 



The model has two parts : an error distribution which describes the internal 
operation of the system (with e as a variable ); and a structural equation in 
which a realized vector e from the error distribution has determined the 
relation between the response observations y and the unknown values 
. . • , 0 r , a for the system characteristics (with e as a constant). 

The notation can be made more compact by letting 



designate the response vector y with appended structural vectors v l5 . . . v r ; by 
letting 



designate the error vector e with appended structural vectors; by letting 
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designate the composite quantity in matrix array ; and by letting 

f{E) dE = n /<>*) n de i 

designate the error distribution. The regression model can then be written 

/(£) de, 

Y — 6E. 

The transformation 6 is an element of the regression-scale group 

( fl 0 01 \ 


G = (g 


0 1 0 
a 1 ■ • a T c 


with group properties 
fl 0 0~) f 1 0 O' 


— co < a u < °o 
0 < c < co 


0 0 


0 1 0 0 1 0 0 1 0 

A 1 ■■■ A r C __ ^ a x • • • a r c _ [a + C«i ■ • • 4, + Ca r Cc 

fl 0 01 fl 0 0V 1 f 1 0 0" 


0 10 0 10 < 

0 ■ • • 0 1 J 

The orbit of a point F is the set 

GY — {g Y: gsG}, 

or equivalently the set 

G • y = ( apt! + • • • + ci T y T + cy : 


1 0 


co < a u < CO 
0 < c < 00 


Z+(v x , . . . , v r ; y). 




I! 
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Suppose now that n > r + 1 and that y lt . . . v r are linearly independent ; this 
avoids trivial cases with more quantities than effective measurements. The 
orbit is a half (r + 1 )-space; it is contained in the (r + 1 )- dimensional 
subspace 

T(v ls ■ . . , v r , y) = {a 1 v 1 + • ■ • + a r v T + a r+ 1 y : _ 00 < < 00 } 

but consists of the positive ^//corresponding to positive coefficient for y. 
The orbit is subtended by the r-dimensional subspace 

L{y x , , v r ), 

and it consists of the positive translates 

£Oi, • • • , v r ) + cy 

of that subspace; see Figures 1 and 2. The points y in the subspace L(v 1; . . . ,v T ) 
are implicitly excluded without loss of essential generality in the sequel. 

A point y is carried into a point 

y = + • • • + a T y T + cy 

by a transformation g. The vectors v x , . . . , v r , y are linearly independent; 
there is then no alternative choice for a transformation carrying y into y. 
It follows that G is unitary on R n (subtending subspace excluded). And it 
follows then that the regression model is a structural model. 

3 A TRANSFORMATION VARIABLE 

Consider the choice of a transformation variable [F] to describe the 
position of a point F on its orbit. For the measurement model the location 
variable x gave the projection xl of the vector x onto the one-dimensional 
subspace L(l/ the scale variable ^ gave the distance of x from L( 1) (units of 
length (n - 1)/), and the transformation [x, s x ] gave position (see Figure 6 ). 
A transformation variable can be defined for the linear regression model in an 
analogous way: Location can be described by the projection of y into the 
subtending subspace L(y 1} . . . , v r ), scale can be given by the distance of y 
from the subspace, and position by combining these into a transformation 
matrix. 

Consider a point y (or F) in R n , and let 

MyK + • • • + b r { y)v r 

be the projection of y into the r-dimensional subspace L(v ls . . . , v r ): The 
projection of y into the subspace L{y 1 , . . . , v r ) is that vector 

61 V 1 + • • ■ + b r y r 
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in the subspace for which the residual vector 

y - (bpti + • ■ • + b r y r ) 

is orthogonal to each of y L , ... , v r , and hence to each vector in L(y x , . . . , v r ) 
(See Figure 7.) The orthogonality conditions in the definition are 

(y “ Vi — • * • — b T y T , v x ) = 0 


(y - b 1 y 1 b T y r , y r ) = 0, . 

where (x, y) designates the inner product 

(x, y) = x x y x + •'•■ + x n y n = (y, x) 
of the vectors x and y. The inner product is linear in each argument: 

( a ! x i + fl 2 x 2 , y) = a x (x u y) + a 2 (x 2 , y), 

( x > + ^ 2 y 2 ) = b x (x, yx) + b 2 (x, y 2 ); 

accordingly, the conditions can be rearranged to give the orthogonality 
equations : 

( v i> Vi)bi + • • • + (v lt y T )b r = (v x , y) 


(y r , y x )b x + • • • + (y r , y T )b T — (v r , y). 

The inner products as they appear in the preceding array form the first r 
rows of the matrix product: 


v-n y x 



Ui •• 

f (vx, v x ) 


(Yr, Vl) 

(y, vx) 


Vrn 

Vn 


(Vl, V r ) 


(Vr.V r ) 

(y> vj 
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Let the first r rows of Y be designated 


i’ii • ' ' V ln 


v rX ■ ■ ■ v T7l 


the orthogonality equations can then be written. 


The vectors v v have been assumed linearly independent . the matrix 

nS and the matrix VV' is then of rani hence .nonsmgu, at. 
The orthogonality equations can then be solved uniquely g 


= (VV'Y X V y. 


The projection of y into the subspace L(y x , • • • v r) 
6i(y) y i + • ' ' + K(y) v n 
where the coefficients b x ( y), .... b r (y) are given by 


{vvj-wr. 


the coefficients are called the regression coefficients of y on v x , . . • , V ln the 
case of a single v the regression coefficient is 
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the general form is analogous: the inner products with y multiplied on the 
left by the inverse of the inner-product matrix. 

The projection of y into the subspace Z,(v l5 . . . , v r ) can be defined alter- 
natively as that point 

• — |— Jy y 

in the subspace at minimum distance from y. Let bfy), . . . , b T (y) be the 
coefficients obtained by solving the orthogonality equations. The difference 
vector 

y - I b u y u 

can be represented as a sum 

(y ~ 2 6«(y)v^ + 2 (b u ( y) - b u )v u 

of two vectors that are orthogonal; accordingly, the squared length of the 
original vector, 

r 2 

y - 1 b u v u , 

i 

is equalf to the squared length of the first vector plus the squared length of 
the second vector: 

y - 2 b u {y)v u + 2 0„(y) - b u )v u . 

i i 

Choosing b 1} ... ,b T to minimize the length of the original vector is equivalent 
to choosing b u ... ,b T to minimize the length of the second vector, but the 
second vector can be made equal to the zero vector by choosing b u — b u ( y). 
Thus the projection into the subspace is the closest point in the subspace 
(see Figure 8). 

The residual vector is 

y - My) y i b T ( y)v r . 

Let s(y) be the residual length : 

s (y) - |y - 26„(y)vj, 

■* 2 (y) = ly - 2 *«(yKI 2 

- (y - 2 b u(y>u , y - 2 K(y K); 

f Pythagoras. If x and y are orthogonal, (x, y) = 0, then |x + y| 2 = (x + y, x + y) = 
(x, x) + 2(x, y) + (y, y) = (x, x) + (y, y) = |x| 2 + |yj 2 . 
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y 

-£ou(y)v u ■ 


\y 

\ %buVu 


~Xb u { y)v u 
bX(f> u (y) - b u )v u 
^Zb u v u 


Figure 8 A general point £ by u in the subspace L(y v v 2 ) and the projection £ 6 tt (u)v^of 
y into the subspace L(y v v 2 ). The projection point £ 6»v u ts the pent L{ v 2 ) 
is closest to y. 

and let d(y) be the unit residual vector : 

d( y ) = 5 _1 (y)(y - ^i(y) v i — • • • — ^(yK)- 

The unit residual vector is orthogonal to the subspace L(y x , . • • ,v r ), has 

unit length and is a vector in L + (v x , . . . , v r , y). • t and 

The vector y can be reconstructed from the regression coefficients and 

residual length : 

y = & 1 (y)v 1 + • • • + b T ( y)v r + ^(y) d(y). 

This can be expressed in matrix notation. 

fv'T r 1 0 o If y[ 1 


L bi(y) ••• b ,.(y) s(y)JLd'(y) 
[Y] D(Y). 
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0 0 


L 6 i(y) ••• ^r(y) -s(y) J 
is an element of the regression-scale group; the matrix 


D(Y) = 


U(y)J U(y) d n ( y)J 

then designates a vector on the orbit of y, a vector d(y) which has unit length 
and is orthogonal to the subtending subspace L(y x , . . . , v. r ). It follows that 
D( Y ) is b. fixed point, a reference point, on the orbit G Y. And it follows then 
that [Y] is a. transformation variable. The transformation [Y] as an element 
of a duplicate group G * gives the position of Y on its orbit; see Figure 7. 
The regression model can now be written 

m dE , 

[Y] = 6[E],. D(Y) = D(E). 

The structural equation conditional on the orbit has simple form : 

bi(y) = Pi + cbfe) 


b T ( y) = P r + obfe), 

s(y) = cry(e). 

The regression coefficients and residual length have produced a transforma- 
tion variable with some convenient matrix properties. The regression coeffi- 
cients and residual length are based on Euclidean distance: 

|y - x| = [2 (z/i - 

and on the related inner product: 

(x, y) = 2 x iVv 
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The use of other distance functions such as 

d( y, x) = 2 I Vi - x i\ 

can produce other transformation variables in an analogous manner. 

“ " irs I £-1 r. ;; 

ssstssssssr* 

4 WITH ORTHOGONAL BASIS 

In the examples in Section 1 structural vectors v„ v „ •>, *re 
introduced to describe successively more complex dependence of the general 
resuonsetevel on controllable variables. Then, as an alternate, structural 
vectors w tv were success, vely introduced to describe the same succes- 
Iwv more complex dependence. The vectors w„ w„ w, were mutually 
orthogonal, and there was mention that this orthogonality had advantages 

m Now Consider in general the regression model and suppose that the 
Now consider g natural order of decreasing intrusive- 

neTTthT 'mannl'r’indicated E>y the examples. The corresponding sequence 
ofmthogonal structural vectors can be constructed by the results tn Sectton 3. 
For notation, consider the equation 



Let M be the regression coefficient of v, on v,; the vector »a is ‘hen the 
rnrresoonding residual vector and is orthogonal to v x — vf v Let p 31 , p 32 
the regression coefficients of v 3 on v l5 v 2 ; the vector w 3 is then the correspon mg 


§ 4 With Orthogonal Basis J()7 

residual vector and is orthogonal to v l5 v 2 and hence to w l3 w 2 . Finally, let 
p r i > • ■ ■ >Ptt- 1 be the regression coefficients of v r on v b , . . , v^; the vector 
w r is then the corresponding residual vector and is orthogonal to v l3 . . . , v r _ x 
and hence to w l3 . . . , w r _ x . The matrix W records the orthogonal structural 
vectors w l3 . . . , w r derived successively from the structural vectors in V. 

The model can be presented in terms of the alternative basis for the 
subtending subspace. Let 

W 11 ' ' ' w ln 

Y =* 

M ; rl " • - W rn 
^ 2/l • ' • Vn 

designate the response vector y but with orthogonal structural vectors 
appended; let 

" W ll • • ' Win 

E = . 

w n • • • w rn 

designate the error vector e but with orthogonal structural vectors appended ; 
and let 

"1 0 0 

6 = , 

0 1 0 
_ a x • • ■ a r a 

designate the composite quantity appropriate to the new orthogonal struc- 
tural vectors. The regression model can then be expressed in the alternative 
form 





RE) dE, 
Y = 6 E. 
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The new matrix Y can be expressed in terms of the original matrix Y\ 


~PZX ~PZ2 1 


And correspondingly E can be expressed in terms of E. 


The new structural equation can be related to the old . 


The new quantity in terms of the old can then be extracted . 


with cr common to 6 and 0, this can be expressed more briefly as 

(pti, (01, - • • , 0r)-P~ X - 

The inverse of P has the following form: 
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then has the same property; hence the residual length s(y) and the unit i 

residual vector d(y) also have this property. The matrix Y can now be * 


Y — [ Y]D(Y ) = [T] 


[_ dx(y) ■■■ d n(y) J 

(see Figure 9). Note that the reference point in matrix form D(Y) records 
the unit residual vector but with the orthogonal structural vectors appended. 

The new position [Y] in terms of the old can be extracted in the same 
manner as the new quantity 6 in terms of the old: 

r 

P 0 

7= Y, 

0 1 


P 0 

D{Y) = D(Y\ 


P 0 

[Y]D(Y) 

0 1 


D(Yy, 


With s (y) common to [ 7 ] and [H, this can be expressed more briefly as 
(ai(y), . . . , o,(y)) — (i’i(y). • ■ • 5 h r ( 1>) J ‘ ■ 





Figure 9 The projection (Xjfyjwj + a 2 (y) w 2 °f y into the two-dimensional subspace 
L(Wi, w 2 ) = L(v lt v 2 ). Squared lengths of vectors are recorded. 

These equations have the same form as the corresponding equations for 
0 m terms of 6, and (a l5 . . . , a r ) in terms of (P ly . . . , fi T ). Note that or r (y) = 

K( y)- 

The regression model in alternative form can now be expressed with com- 
posite structural equation: 

f(E) dE, 

[F] = Q[E], D( F) = D{E). 

The structural equation conditional on the orbit can be written 
°i(y) = «i + craj(e) 

«r(y) = a r + oa r (e), 
s( y) — crs(e). 


The orthogonal structural vectors w 1; . . . , w r can provide directions for 
the first r of a new set of axes. For the first axis, u x (y) measures distance in 
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Figure 10 The observation F as y; the error £ as e; the transformation 



units of length K| ; the coordinate for the first axis is then 

, , , . Or. J) , w , _ ' 

ai(Y) W = l»J 

_ ^ 11^1 + ' ' ' + 'WmVn ' 

Kl 

Similarly the coordinate for the uth axis is 


/ X , , Wulgl + ' • ' ± 3^-2 . 

ajy) Kl = 

^The'sum of’’ squares of coordinates of a ^‘“‘^X'ex pre^edas^ 
orthogonal transformation. The sum of squares .7, P 

sum of squares with respect to the new axes . 


i^ = 2 ^(y)Kr + 5 ' 2 (y). 
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This can be recorded component by component in an analysis-of-variance 
table. 


Component Structure of Component 

( a i(y) Kl) 2 («! Kl + (ftqfe) IwjJ) 2 

(««<*) Kl) 2 ( a 2 IK + ffo 2 (e) jw 2 |) 2 

( fl 3 (y) Kl) 2 (« 3 |w 3 | + <ra 3 (e) |w 3 |) 2 

s 2 (y) (ra(e)) 2 

2^? 

The sources for the components are labeled in the manner appropriate to the 
examples in Section 1. 

5 COMPUTATION 

The position [Y] is a matrix containing regression coefficients b x ( y), . . . , 
b r ( y) an d a residual length s(y). The matrix P, which produces the new 
orthogonal basis W — PV, contains regression coefficients of w-vectors on 
v-vectors. And the matrix P -1 , which produces the new quantities 

(cq, . . . , a r ) = (ft, . . . , 

and the new regression coefficients 

(fix, . . . , a r ) — (b x , • . . , b^)P 1 , 

also contains regression coefficients, the regression coefficients in fact of 
v-vectors on w-vectors. All of these regression coefficients can be calculated 
from the elements of the inner-product matrix YY'. They can all be calcu- 
lated by a simple repetitive operation applied to that matrix. 

The three examples in Section 1 concern a single response vector y and its 
relation to structural vectors v 1; v 2 , v 3 : to the first vector; to the first two 
vectors; to the first three vectors. Correspondingly, in general, there may be 
interest in a response vector y and its relation to~structural vectors 

’ ^ 1 , v 2 , . . . , v T : 

to the first vector; to the first two vectors; to the first three vectors; .... 
All the regression coefficients for each step can be calculated from the inner 
products that appear in the matrix YY'. In fact they are obtained as part 
of applying the repetitive operation to the matrix YY'. 

The notation can be extended temporarily to handle the succession of 
steps by introducing a superscript to indicate the number of structural 
vectors involved: With r structural vectors the response matrix is Y {r) ; the 


Source Dimension 

Mean (w 1 ) 1 

Treatment (Wg) 1 

Variable (w 3 ) 1 

Residual (d) n — 3 

Total I 
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regression coefficients are 6?'(y), .... the residual len * h “ 

“The 0 staple operation can be described most easily by considering two 
vectors ttand a single structural vector v. The regress, on coefficents are 

(v, , , v ( v > 

i, ( y ') = tta ■ 


A typical inner product of residuals is 
( y< - b( y ( )T, y, - 6(y» 

= (y„ y 3 ) - biyjiy, y f ) - b (y 3 )(v, y<) + b(y l )b(y ] )(y, ▼) : 

, , (v, y<)(v, y 3 ) 

- (y<» y 3 ) ( V> v ) ’ 


(a generalization of this appears in Problems 14, 15, 16); the inner-product 
matrix for residuals is 


0(y l5 y 2 ‘-v) = 


' . (v.yrXv.y,) (v ) _(iy J XM?> 

(yt,y.) Wl,w (*,v) 

. . (v. ys)(v. y.) 

(y=,y,) ' y - w (v, v) 


Now consider the inner-product matrix for v, y x , y 2 - 


(v, v) 

(V, yi) 

0, y 2 ) 

(yi, ▼) 

(yi. yi) 

(yi. y 2 ) 

,(y 2 . v ) 

(y 2 , yJ' 

•'(y 2 >y 2 ) 


The simple operation is: Divide through the first row by the leading element 
to obtain a new first row; subtract multiples of the new first row from remaining 
Zs , "produce eeros in the positions corresponding to the ieadtng element. 

"i j Kyi) W 
o i 

i 2(yi>y 2 ;v ) 

n i 


The resulting matrix clearly contains the regression coefficients and the inner- 
product matrix for residuals. The simple operatmn ts equ, valent to left 
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1 

(v,v) 

(yi.y) 

(v,v) 

(y 2 > v) 

(v, v) 




Left multiplication by a matrix produces new rows that are linear combina- 
tions of old rows. 

Now consider the inner product matrix YY'\ 


" (v l5 v x ) • • • (v x , v r ) (v l3 y) 
YY' = . 

(V Vi) • • • (v r , v r ) (v r , y) 
^(y.Vi) (y, v r )(y, y) 


The first r rows contain the matrix array of coefficients for the orthogonality 
equations for K r) (y)> • • • . K r) (y)- If an equation is multiplied through by a 
constant, if equations are subtracted, if this operation is repeated so that the 
coefficients on the left side of the equations become the r X r identity 
matrix, then the new “equations” state that the coefficients on the right side 
are the solutions: 



In the same manner the first 5 rows and the first s columns plus the last 
column give the corresponding matrix array of coefficients for the orthogo- 
nality equations for b[ s) (y), . . . , b { g s) ( y). If the preceding operations are 
applied to the first s rows to produce the s x s identity, then the “coefficients” 
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on the right side are the solutions: 

1 0 * ... * b[ s \y) 

0 1 * .••• * b[ s \y) 

*•••**•••* * 

* ... * * ... * * 



i 
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The array contains the “solution” ^(y), the regression coefficient of y on 
Yl ; it contains the matrix Q of inner products of residuals; it contains the 
inverse of (v l5 v x ), and it contains the regression coefficient p 2 x (= p 21 ) of 
v 2 on Vj_. The array also contains the elements p 21 , ... , p Tl for the first 
column of the inverse matrix P- 1 : the matrix P~ x gives the original basis 
from the orthogonal basis: 


V = P~ X W 


c 

▼l 


r 

i 


<T 


r “\ 

Wi 


= 


p 21 

l 




y' 



P rl 


• p Tr_1 i . 




The simple operation can be applied successively. It generates successive y 
thel x 1 identity and the solution (y), the 2 X 2 identity and the so utions 
b[ 2) (y), b (2) (y), . . . , the r x r identity and the solutions b[ r) ( y), • • • > r ^0’ 
and it generates related elements of interest. _ 

For consider the inner product matrix YY' together with an (r + ) x 
(r + 1) identity matrix: 

f(vi,Ti) ••• (vi, v r ) (v l5 y) 1 0 <H 


(v r , v x ) ••• (v rJ v r ) (v. r , y) 0 10 

_ (y, vi) • • • (y, v r ) (y, y) o 0 1 J 

Apply the simple operation to the rows of the augmented matrix with the 
first-row-first-column element as leading element : 


"i 

p 21 - 

• p rl l 

| b[ l \ y) 

(Vi, Vi) 1 1 0 • ■ 

- 0 

0 




* . 1 1 

0 


Q(v 2 , • 

V 

. . j 

y-Vj) 

i 


.0 




* 1 0 

1 


the vectors w lt . . . , w r are orthogonal ; the elements of P _1 are individual 
regression coefficients of v vectors on w vectors; the elements p 21 , . . . ,p rl 
are the individual regression coefficients of the vectors v 2 , . . . , v r on the 
vector w x (— Yi). 

Now apply the simple operation to the rows of the modified array with the 
second-row-second-column element as leading element : 


"l 

0 

P 31 

* 

• • * 

b?(y) 

>')-l 

0 •• 

• 0 " 

0 

1 

p 32 

-12 

P 

7*2 

• • p\ 

6?’( y) 

0 • 

• 0 

0 

0 





* * 

1 

0 



Q(y 3 , ■ ■ 

• , V„ y:Y lt v 2 ) 




0 

0 





* * 

0 

1 


The array contains the “solution” b[ 2) ( y), h 2 2) (y), the regression coefficients 
of y on Vi, v 2 ; it contains the matrix Q of inner products of residuals; it 
contains the regression coefficients p zx , p 32 ( — p M ) of v 3 on v u y 2 ; and it 
contains the individual regression coefficients p 32 , . . . ,p ri of v 3) . . . , v r 
on \v 2 . The array also contains the inverse of the inner product matrix 
j/( 2 )j/( 2 )' ; row operations reduce j/ (2) F (2) ’ to the 2 x 2 identity; the same 
row operations applied to the 2x2 identity must produce the inverse 
matrix. 
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Now apply the simple operation to the rows of the further modified array 
with the third-row-third-column element as leading element ; 


1 

0 

0 

1 Pi 1 

* 

M 3, (y) 


0 • ■ 

• o' 

0 

1 

0 

Pi2 

=1= 

bf{ y) 

3>j/(3)')-l 

0 • • 

■ 0 

0 

0 

1 


• • • 4 

b?\y) 


0 • ■ 

• • 0 

0 

0 

0 




* * * 

1 

0 




Q(v 4 

, • • .Any: 

:v x , v 2) v 3 ) 




.0 

0. 

0 




* * * 

0 . 

1 


The array contains the “solution” h| 3) ( y), b { f ] (y), h 3 3) ( y), the regression 
coefficients of y on v ls v 2 , v 3 ; it contains the matrix Q of inner products of 
residuals ; it contains the inverse matrix ( F (3) F (3> ') _1 ; it contains the regression 
coefficients p il} p i2 , p 43 (= p iZ ) of v 4 on v x , v 2 , v 3 ; and it contains the regression 


coefficients (individual) of v 4 , . . . , v r on w 3 . 

The simple operation applied r times produces the regression coefficients 
b[ r) (y), ■ ■ ■ , bffy), the inner product matrix of residuals, the inverse matrix 
( p(r) elements to give the (r + l)st row of P, and the elements to 

give the rth column of P -1 . 

Justifications. For b[ 3) (y) , . . . , b { s s) (y) '■ the first s columns and the y 
column contain the coefficients in the orthogonality equations for the 
regression coefficients of y on v x , . . . •, v s ; the simple operation applied s 
times solves these equations; the resulting “equations” present the solutions 
in the y column. For / 7 S+XX , . . . ,p.,+is : the preceding argument with y re- 
placed by v s+1 . For (F <3) F (S), )“ 1 : the first 5 rows and 5 columns contain 
ywyw'^ an d g rs t s rows and first s columns after the y column contain 
the s X s identity matrix; the simple operation applied s times amounts to 
premultiplication by a matrix — the matrix that carries V (3) V (S) into the 
identity, hence carries the identity into (K (S >K (S) ')~ 1 - Forp 3+13 , . . . ,p rS , and 

Q(v s+1 , v, r , y: v x , . . . , v s ) (by induction from s to s + 1): the matrix 

Q(v s+1 , . . . , v r , y: v x , . . . , v 3 ) is the inner-product matrix for vectors v - 
2* h\ s) ( v K with v = v J+1 , . . . , v P , y : the simple operation gives the regression 
coefficients c(v) for the vectors v — S 3 h^ s) (v)v u (with v = v s+2 , . . . , v r , y) 
on the single vector w 3+1 = v s+1 — S 3 h^ s) (v s+1 )v u ; since v x , . . . , v s aie 
orthogonal to w 3+x , it follows that the c(v) (with v = v s+2 , . . . , v r ) are also 
the regression coefficients p> s+ls , ... ,p T3 of the vectors v (with v = v s+2 , • • • > 
v r ) on the single vector w s+1 . The simple operation gives the inner-product 
matrix of the residuals v — £ x h ( u s) ( v ) v « — c ( v )( v s+i — b {3 \v s + i) v u)i suc h a 
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residual has the form y minus a linear combination of v t , . . . , v v , , and it 
is orthogonal to v x , . . . , v s , w 3+x , hence to v x , . . . , v 3 , v 3+1 ; such a residual 
must then be the residual of v orthogonalized to v l3 . . . , v s , v s+1 , and the 
inner-product matrix must then be the inner-product matrix of such residuals. 

6 THE EXAMPLES 

Consider the examples in Section 1 , and suppose that the response vector 
y is given by the final row in the augmented matrix 


(_7 12 8 15 13 14 17 14 14 17 14 17 J 

In this example the structural vectors have a simple form that allows the 
orthogonal vectors to be written down by inspection and the regression 
coefficients to be calculated as averages. This simple form also allows the 
examples to serve as a transparent first illustration of the computation 
methods in Section 5. 

The inner product matrix with appended identity matrix is 


12 9 9 162 1 0 0 0 

9 9 9 135 0 1 0 0 

9 9 15 141 0 0 1 0 

162 135 141 2302 0 0 0 1. 


A first application of the simple operation produces the numbers appro- 
priate to the regression model j* with one structural vector: 


Vl 1 11 1 ?J Rjll M 0 0 0 

0~2£ 2i \3i~ -l 1 0 0 

v 3 0 2 \ 81 191 -f 0 1 0 

y 0 13 2 19 2 Ins -131 0 0 1. 


f The. model m this case is also a measurement model but with residual length j ( 1 ) (v) 
replacing standard deviation s y . 6 




yi« = [Y™}D(Y™) = 


D(T< 2) ), 
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11 11111 111 11 
00 011 11 111 11 

D(7 (2) ) = 

4440^114411444 

V34 V34 V34 V34 V34 V 34 V34 V34 V34 V34 V34 

1 0 

J/[f(2) — p( 2)y(2) _ 

_3 1 

4 1 

1 1 1111111111 

3 3 _aiiiii-ii.il 

4 ¥ 4 4 4 4 44 44 4 4 

The analysis-of-variance table can be calculated : 


Source 

Dimension 

Component 

Structure of Component 

Mean (w x ) 

1 

2187 

(ajVi 2 + ff ai (e)Vl2) 2 

Treatment (w 2 ) 

1 

81 

(a 2 V2| + cra 2 (e)V2i)2 

Residual (d (2) ) 

10 

34 

(trs (2, (e)) 2 


12 

2302 



The component 81 is the difference between the squared residual length 
(s (1, (y)) 2 =115 relative to v x and the squared residual length (s (2) (yj) 2 = 34 
relative to v x and v 2 . The remaining elements in the matrices P and P~ l are 
available : 

lOo] f 1 0 0 

— i 1 0 p - 1== I 1 0 

0 -i| 1 I |T) 1 

A third application of the simple operation produces the numbers appro- 
priate to the regression model with three structural vectors: 



Ti 

V 2 

v 3 

y 




Vi 

1 

0 

0 

9 


1 

3 

1 

3 

0 


0 

1 

0 

5 


1 

3 

1 1 
18 

1 

8 

v 3 

0 

0 

1 

CD 


0 

1 

6 

1 

6 

y 

0 

0 

0 

28 

— 9 

-5 

-1 
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The position and the reference point are given in the expressions 


7 = [Y]D(Y) 


:V 


§<* 

given in the expressions 
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0 

0 





1 

0 

0 

0 

1 

0 




0 

1 

0 

0 




0 

1 

0 

0 

D{Y), 


7= [7]D(7) = 

0 

0 

1 

0 

0 

0 

1 

0 





9 

5 

1 

V28 




13* 

6 

1 

V28 


D(Y), 


D(Y) = 


1 

0 

. 0 

-2 


W=zPV 


1 
1 
0 
1 

/28 
0 0' 

1 0 
-1 1 

1 1 

_ 3 _ 3 

I 4 

0 


1 

1 

1 

1 

1 



1 

1 

1 

1 

1 

l 

l 

l 

1 

1 

l 

1 

1 

1 

1 

1 

1 



-i 

-f 

-1 

i 

i 

i 

i 

i 

i 

i 

i 

i 

1 

1 

2 

2 

2 


D(Y) = 

0 

0 

0 

-1 

-1 

-1 

0 

0 

0 

1 

1 

l 

-1 

-1 

1 

-2 

1 



-2 

3 

-1 

1 

-1 

0 

2 

-1 

-1 

1 

-2 

1 

V28 

V28 

V28 

V.28 

V28 



V28 

V- 

V28 

V28 

V28 

| OO 

1 CN 

V28 

V28 

V28 

V28 

V28 

V28 

V28 


The regression coefficients for [7] are circled in the three matrix arrays. 

The computations on the matrix Y can be used to illustrate a matrix 
factorization that will appear in later sections of this chapter. The matrix Y 
has been factored : 


1 

■ t 

0 -1 -1 


1111111 

i i 1 i i i i 

-1 0 0 0 1 1 1 


The analysis-of-variance table can be calculated : 

Dimension Component 


Source 


Structure of Component 


Mean (vfy) 
Treatment (w 2 ) 
Variable (w 3 ) 
Residual (d) . 


1 

1 

1 

9 

12 


2187 

81 

6 

28 

2302 


(ajVl 2 + a ai (e)Y 12) 2 

(a 2 V 2i + o-a 2 (e)V2i) a 
(a 3 V 6 + <ya 3 (e)V 6) 2 

(cr 2 s(e )) 2 


The component 6 is the difference between the squared residual length 
(s«»( y)) 2 = 34 and the squared residual length (s(y))* = 28. 

The numbers are also available for the analysis in terms of the orthogonal 
basis. With three structural vectors, the position and reference point are 


7 = [7]D(7) = [Y] 


13* 


P 0 
0 1 

0 

0 

0 

V28_, 

0 
0 
0 

V28 


D(Y) 


"l 

0 

0 

•> 

0 


r 

< 

1 

1 

0 

0 



1 

1 

1 

0 


W 3 

^0 

0 

0 

1 

J 


d' 

^ J 


wi 

w'- 

W3 

d' 


The vectors w 1; w a , w 3 , d are mutually orthogonal. Let u l5 u 2 , u 3 , d be the 
corresponding unit or normal vectors: 


w 2 

IwJ 


W 3 

|w 3 | 
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lill 
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the vectors u l5 u 2 , u 3 , d form an orthonormal set (the vectors u l3 
unit vectors for the new axes mentioned at the end of Section 4). 


3 

4 

1 o 

3 

¥ 

1 1 

13i 

6 1 

75 

0 

*7l2 

75 

|7l2 

72* 

13*75 

675 

1 1 

i i 

0 1 

i i 

0 0 

0 0 

8 15 

13 14 


0 0 
1 0 
1 75 


ol f Vl2 0 

0 0 V2i 


0 0 
i o 
0 Vh 


0 < 

0 Uo 


7 12 8 15 13 14 1/ 14 14 1/ 14 n j 

' 715 0 0 <f 

|7 12 721 o 0 

~~ |V 12 V2i 76 0 

^13*7l2 6V2i 76 '728, 

1 1 _1 1 1 1 1 __L; -4= -4= -4= 

712 7l2 7l2 7 12 7-12 712 712 7 12 712 712 7l2 712 

75 75 75 75 72* 72i 75 75 7i* 72* 72* 721 

0 0 0 ^ 7s 7e 0 0 ° 7* 76 7 

-2 j*_ _2 5 — —7 -7- 2 z7 _L 

^728 728 728 728 728 75 75 728 728 75 728 728 



The original matrix T has been factored into a positive lower triangular 
matrix and a matrix with four orthonormal vectors. The four row vectors in 
the original matrix can be represented by the four row vectors in the triangular 
matrix: These row vectors in the triangular matrix are with respect to axes 
given by the orthonormal vectors in the final matrix. The inner product 
matrices in the two representations are of course equal: 

7l2 0 0 0 U 712 00 0 

|7l2 75 0 0 *712 75 0 0 

*712 75 76 o *712 75 76 o 

13*712 675 76 75j[l3|7l2 675 76 75 

a triangular square-root factorization of YY'. 

7 THE REDUCED MODEL 

Consider the regression model as developed in Sections 2, 3 : 

RE) dE, 

m = 0[£], E)(Y) — D(E). 

Let [T] be the transformation variable defined in Section 3 and W = PV be 
the orthogonal basis defined in Section 4. 

The invariant differential on the space X — R n can be obtained (Section 3, 
Chapter Two) from the Jacobian of a transformation on that space: 

Vx = a x v xx + • • • + a T v T i + cy x 


V n — a X V ln + 1- a T v rn + cy n , 

Jnig; y) = C", 

JnW = J"(y)» 

dm(Y) = = 5L 

Ry) s n (y) 

The left and right invariant differentials on the group can be obtained 
(Sections 3 and 4 in Chapter Two) from Jacobians of transformations on the 
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"1 

0 

<T 


" 1 

0 

<f 


" 1 

0 

0 " 

















= 








5 

0 

1 

0 


0 

1 

0 


0 

1 

0 

• ^llll 









* 

* 





c 


^ a i 

■ n r 



l a l 

• O-r 

C ^ 



Si = a x + cai 


a r + ca T , 
cc*. 


S r A - 1 


c 

0 

0 



1 

0 

* 

&1 




= c r+1 , 

j* — 

J r+ 1 




0 

c 

0 



0 

1 

* 

a r 

0 • 

■ 0 

c 



0 • 

• • 0 

C* 


S r-Ali g ) 


_r+l 


J / \ TT d a u dc __ 

dft\§) c r+l C r+X 


Ms) 


Jr+ 1(8) — C ’ 

_ XI da u dc _ dg 


A(g) = 






~4£? ?.V 


71 General Distributions. The conditional probability element for the 

error position [E] given the orbit D(E) - 2) can be obtained by exchanging . , j 

invariant differentials (Section 4 in Chapter Two) : 

g([E] : D) d[E ] = k(D)f([E]D) ^ 


k(D) H/(2 b u v ui + sdXs 71 ^ 1 XI db u 

i = 1 \ « / 


cis. 
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And the structural probability element for B given Y can be obtained by 
manipulating invariant differentials (Section 5 in Chapter Two) : 

g*(6:r) cW = k(D(Y))f(,e- 1 Y)(^j(s(y))~'dv(e) 

= k{D(Y )) ~ — (~J (s(y)y r ^S ilaJ? 

7.2 Decomposition. The structural equation for the regression model, 
^i(y) — Pi + c^i(c) 


K(y) — P r + °K( e )> 

s(y) = as(e), 

can be separated into a part concerning the P's: 

b i(y) ~ Pi _ &i(e) 


^(y) 


s(e) 


t i(e) 


K(y) - Pr_ ^r(e) 


s( y) 


and a part concerning a: 


s{y) 


5 (e) 


s(e). 


h( e ); 


In a related manner the regression-scale group has a location subgroup : 


L = 


/ 

'1 

0 

(T 






: — 00 < a u < co 


0 

1 

0 




* a r 


; 


; 
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(( ai (E), 02 (E), s(E)) 


I^ToTT) 


~yf(h(E), m, i) 


r — a\ 
jh(E) 


Figure 11 The regression-scale group as G*, recording the position [E] of £ on its orbit. 
The normal regression subgroup L. The scale subgroup S; and an orbit or right coset S[E] 
of the scale subgroup S. 


and a scale subgroup : 


1 0 0" 


: 0 < c < co }. 


0 • • ■ 0 c 


(See Figure 11.) 


7.3 Location Distributions. Consider inference concerning the /3’s. The jg 
marginal distribution of the error variable (t t (e), . . . , t r (e)) can be obtained , ' : 
from the error probability distribution ^ 7 

k(D)f([E]D)s"-r~ l dbu ds: §|f 

■ *V - 

the joint distribution of (q, . . . , t T , s) is ' , 

k(D)f([E]D)s”-'Udtuds; f 
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the marginal distribution of (q(e), . . . , t r ( e )) is 

gi (t: Z>) n it. = fc(0)|" + d\ h-‘ ds • n 

The structural distribution for (/S x , . . . , j8 r ) is 

g&wndp 

A group element g can be factored uniquely, 

1 0 0 


0 1 0 
£1 . . . £r j 

c c 

as an element of S times an element of L (G is a semidirect product of S 
and L \ definition on p. 69) The scale group S generates orbits (right cosets) 
on the error space G *: 

1 0 0 ' 

= 5 

0 1 0 

M e ) ••• Me) MM - 

see Figure 11. The error variable (q(e), . . . , q(e)) indexes these scale-group 
orbits. A group element g can alternatively be factored uniquely: 






TMH?- 


ds - n Mu- 



1 0 0 


0 1 0 

0 ••• 0 
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as an element of L times an element of S. The scale group S generates the 
left coset 



for the quantity 0. Thus the right coset distribution of the t s produces a 
left coset distribution for the (3's. 

7.4 Scale Distributions. Now consider inference concerning a. The 
marginal distribution of the error quantity s(e) is 

g s (s : D) ds = fc(D)J • • • J n/(2 b » v ui + sd ^j II db « • s ' n-r_1 ds - 

The structural distribution for a is 
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7.5 Tests of Location. Consider tests of significance concerning the 
location quantity {5 = . . . , f T ) r . Suppose some outside source has indi- 

cated that 


The hypothesis (3 — (3 0 , together with the structural equation, leads to the 
value 


Oi(e), . . . , f r (e)) 


s(y) 


K(y) - firO 
s( y) . 


for a characteristic of the unknown error e. This value can be compared 
with the distribution 

g L (t:D(Y))U< 

derived from the error probability distribution; and the hypothesis can be 
assessed accordingly. 

Now suppose some outside source has indicated that f T = f r0 . The 
hypothesis f r = /3 r0 , together with the structural equation, gives the value 

t (e) = b r(y) - Pro = ar(y) ~ PrO 

s(y) s(y) 

for a characteristic of the unknown error e (the coefficient of the last structural 
vector is unaffected by the shift to the orthogonal basis). This value can be 
compared with the distribution of the variable 


t r (e) = 


bfe) a r (e) 


s(e) s(e) 

derived from the distribution g([E]: D(Y)) d[E] or equivalently from the 
distribution g L { t: D(Y))dt and the hypothesis can be assessed accordingly. 

7.6 A Test of Scale. Now consider tests of significance concerning the 
scale quantity a. Suppose some outside source, has indicated that a = o 0 . 
The hypothesis cr o 0 leads to the value 


for a characteristic of the unknown error e. This value for 5 (e) can be 
compared with the distribution 

gs (s-.D(Y))ds, 

and the hypothesis assessed accordingly. 
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8 WITH NORMAL ERROR 

Consider the regression model with standard normal error variables. 

/(E) dE = (2tt)- b/2 exp (-J je| 2 } IT de » 

[Y]=*0[E], D(Y) — D(E). 

The alternative model with orthogonal basis is 

f(E) dE = (2tt) b/2 exp {-J |e| 2 } II de o 
[f] = d[E], D(Y) = D(E). 

8.1 General Distributions: Error. The error probability distribution 
(orthogonal basis) is 

g((E]:D)d(E] 

= k(D)(2'n)- n ' 2 exp (-4 H a u w„ + sd| 2 }s B " r_1 IX da u ds 
= k'(D) exp {— K2 t w «l 2 + s2 ^ s?1 T 1 n da u ds 

II l w «l „ 

= — v exp {— i X a u l w J } II 

(27r) r/2 r 

1 /„2\(n— r)/2— 1 / *»\ . c 2 



exp - 


r((n — r)/2) \2 / l 

lM£ex P H^ra}da- 


^■n— r 

( 27r )(n- T)/2 


s B_r 1 exp 


77ze error regression coefficients afe), . . . , a,(e) are independent normal 
variables with means equal to zero and with variances |WjJ , • ■ • > l w rl ' * e 
jouarei residual length s 2 (e) has a chi-square distribution on n - r degrees 
of freedom; a(e) and 5 (e) are statistically independent; the error probability 
distribution does not depend on the orbit as given by D. It follows then that the 
elements a 2 (e) W 2 , . . . , « 2 (e) |w r | 2 , 5 2 (e) in the structure-of-component 
column of the analysis-of-variance table (Section 4) are independent chi- 
square variables with degrees of freedom 1, . . . , 1, n — r as given in t e 
dimension column. Note : The density factored so that the variables separate , 
the factors have normal and chi-square form; the usual normal and chi- 
square normalizing constants were introduced. . 

The error probability distribution ( original basis) can be obtained by the 

change of variable: 

(a 1? . . . , a r ) =* (o x , ... , b r )P~ x , 


s = s. 


With Normal Error 


and the substitution W = PV: 

g([E]:D)d[E] 

\VV'\' A A 2 

" exp * «P {- f) *■ 

The error regression coefficients 4(e), . . . , b r (e) have a multivariate normal 
distribution | with means 0 and covariance matrix (VV' ffi; the squared residual 
length s (e) has a chi-square distribution on n — r degrees of freedom; b(e), 
s(e) are statistically independent. 

8.2 General Distributions: Structural. The structural probability element 
(orthogonal basis) can be obtained from the error distribution by substitution, 
or from the expression in the preceding subsection: 


g*(d:Y) dd = 


(2tt a 2 ) r 


, w WW , 

a (y)) —r (« 


i(y)) da 


-J ex P - 


s\ y) | s(y) da 
2a 2 i a a 


( 2 ^n-rm\- o} 2a Tj — — • 

The quantities a l5 . . . , oq conditional on a are independent normal variables 
with means af y), . . . , a T ( y) and variances |w x |— 2 cr 2 , . . . , jw r |“ 2 a 2 ; the marg- 
inal distribution of a 2 is that of s 2 ( y)x~ 2 , where % 2 has a chi-square distribution 
on n — r degrees of freedom. 

The structural probability element (original basis) is 

l ‘ (e :Y)de= ( Sr exp { “ i(p - b(y) )' S tP - b(y) )} 

. — An-r ( Xy)\ n ~ T ~' j _ Ay)\s(y) ff 

( 2 7 T) ( n-r ) / 2 ( a J P ( 2a 2 j a a 

The quantity (ffi , . . . , ffi) conditional on a is multivariate normal with mean 
ipii. y)> ■ - • > K(y)) and covariance matrix (VV'ffio 2 ; the marginal distribution 
of a 2 is that of s 2 (y)%~ 2 , where yf has a chi-square distribution on n — r 
degrees of freedom. 

8.3 Location Distributions : Error. The marginal probability distribution 
(orthogonal basis) of the error variable t = t(e), 


f r(e) 


t A nonsingular linear transformation of a vector of independent normal variables 
gives a multivariate normal variable ; e.g., b' = a 'P, a' = b'p- 1 . Note that E( a) = 0, 

~ f w “’ E( - a u a v) — 0(“ J* V), £'(aa') = (IVfVffi 1 ; hence that £'(/ >, ~ 1 b) = 0, 

E(P bb'P x ) = (iYW) 1 ; and hence that £(b) = 0, £(bb') = (VV'ffi. 





Linear Models 


can be obtained from the joint distribution of a, 5 by substitution and 
integration: 


’V'\ ~ A n-T„_ f 1 eX p | _ — Vu 

tWFTt) n/2 Jo (2 7r)" /2 p i 2J 


f " jggf exp f - - (1 + tW't)*"- 1 ds dt • 

Jo (2ftf /2 (27T) ( ”- r)/2 P \ 2 j 

_ I wwtA^ _ di r__L_ ex ( _ h!U^„ 

“ (1 + t W'tr /2 jo (2^ p l 2 ) 

- di 

J7iw w an r-dimensional analog of the t-distribution: The a s are independent 
normal variables with means equal to zero; the sis chi on n - r; the vector t 
is the vector of a’s expressed in units of 5. The distribution in the case r = 1 
and IvqJ = 1 is 

A n - 1 i dt- 

A n (1 + t 2 ) n/2 

the variable i is a simplified t-variable; it is an ordinary Variable divided by 
the square root of its degrees of freedom. The marginal distribution of 


t r (e) = 


ft(e) _ £r00 


rV 7 5(e) s(e) 

can then be obtained by noting that / is a central normal variable in units of a 
chi variable; the marginal distribution is 



^n-r l W rl d[ • 

^_ r+1 (l + ^ 2 |w r | 2 ) (n - T+1)/2 r \ 

it is a rescaled /-distribution on n - r degrees of freedom. The distribution 
of t for general r and with 1 w x | = • ■ • = l w J "T 1 I s 


A n (1 + I O n/ “ 1 

this is the simplified t-distribution in r-dimensions with n - r degrees of 

freedom.. < 4 

The marginal probability distribution {original basis) of the error variable 

‘(e), / N 

1 s(e) s(e) 

can be obtained by the change of variable 

(/j(e), . . . , t T (e )) — (ft( e )> • • • ’ *r( e ))^ 
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The marginal distribution of t(e) is 


: - at. 

A n (1 + t'VV't) n/s 

This is an r-dimensional /-distribution; note that the quadratic expression in 
the denominator now contains cross-product terms. 

8.4 Location Distributions : Structural. The structural probability element 
{orthogonal basis) for the location quantity a = (a x , . . . , a r )' is 

A n _ T \ WW '\*s- r (y) 

— W/v • 


l + («-a(y))'^(a-a(y))y 


this is a relocated and rescaled multivariate /-distribution. 

The structural probability element {original basis) for the location quantity 
P = (ft, . . . , ft)' is 

A n-r \VV'\' A S-fy) 


(l +(P-b(y))'~-'(|3_b(y))J 


this is a relocated and rescaled /-distribution (with cross-product terms). 

8.5 The Example. Consider the example with three structural vectors 
(Sections 1 and 6) and suppose that the error variable is standard normal. 
The regression model for the example is 

e i ~ z l • • ■ , <2i2 = Z 12 , 

1 1 1 i i i i i i i i r 

0 0 0 1 1 1 1 l .l i 11 

0 00000111222 
7 12 8 15 13 14 17 14 14 17 14 17 


10 °0 lllliiiii 1 1 l ‘ 

0100 000111111 1 1 1 
0010 0000 0 0111 222 
/i ft ft <r JUi e 2 e 3 a 5 e 6 e 7 e 8 e 9 e 10 e n e 12 
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a i( e)-^_, W , = — , = *W-& 

13 J = a x + aa^e), 

6 — a 2 + oa 2 (e), 

1 = a 3 + ca 3 (e), 

V28 = cr.s(e), 

where z x , z 2 , z 3 Je nouo designate standard normal variables and is a chi 
variable on nine degrees of freedom. The reduced model can also be presented 
relative to the corresponding orthonormal basis 

Z lJ Z 2> Z 3> XS’ 

13|-\/l2 = a x Vl2 + cr^i, 

6\/2£ = a 2 V 2 1 + <tz 2 , 

lV6 = a 3 V 6 + crz 3 , 

>/28 = 

In this alternative form the adjusted regression coefficients 13|Vl2, 6V2J, 
1 V 6 measure Euclidean distance in the directions defined by the orthogonal 
basis and they produce directly the components in the analysis-of-variance 


s(e) = Xs , 


Source 

Dimension 

Component 

Structure of Component 

Mean (w x ) 

1 

2187 

(«iVl2 + a Zl f 

Treatment (w 2 ) 

1 

81 

2f + crz 2 ) 2 

Variable (w 3 ) 

1 

6 

(ctp/6 + crz 3 ) 2 

Residual (d) 

9 

28 

( a x 9 ? 


12 

2302 



Consider whether the variable x affects the response level. The hypothesis 
p 3 — 0 or, equivalently, the hypothesis a 3 = 0 leads to the value of an error 
characteristic: 

1 = aa 3 (e), 

^28 = crs(e), 
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the corresponding error variable is 


_ q 3 ( e ) _ z s /V 6 
s(e) Xb 


This can be related to an ordinary t-variable on nine degrees of freedom: 
The value is 


and the variable is 


t* 


V6_ 1_ 

i/V 9 V 28 


1.388, 


X»lP ' 


The value is a reasonable value for the distribution: The observations are in 
accord with the hypothesis a 3 = 0. 

The hypothesis can also be assessed from the analysis-of-variance table. 
The hypothesis cc 3 = 0 leads to the value 


for the variable. 


6 

28 

(az s y _ 25 

(dXaf Xl 


This can be related to an ordinary E-variable on one over nine degrees of 
freedom: The value is 


and the variable is 


ML 

28/9 


1.929, 


xll 9 


This is equivalent to the preceding test. 

Suppose now that the effect of the variable x is negligible. The model then 
becomes a regression model with two structural vectors, and the effect of the 
treatment can be assessed. 

The hypothesis oc 2 = 0 can be tested in the preceding manner, or the struc- 
tural distribution for the treatment quantity a 2 can be derived. Consider the 
structural distribution. The structural distribution can be derived within the 
model that has two structural vectors. As a reasonable precaution, however 
it is preferable to derive it within the larger model having three structural 
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vectors, in case the effect of a 3 is not negligible. The structural distribution is 

, V28 . , 

a a = 6 — —— a 2 (e) 

5(e) 

= 6 - 21 !^ 

X. V 2 i 

= 6-A/L,. 

V 2 iV 9 

= 6 - 1.17 1*. 

The distribution has the form of a t-distribution on nine degrees-of-freedom, 
but relocated at a 2 == 6 and rescaled by the factor 1.17. 


THE PROGRESSION MODELf 


9 THE MODEL 


Consider a stable system with a sequence of response variables y u . . . , y 
Suppose the internal error of the system produces a sequence of errors 
e !> ■ ■ ■ > e i> with a known distribution f(e 1 , . . . , e p ) on R p . Suppose also that 
the sequence of errors progressively affects the sequence of response variables. 
As system characteristics for the first response component let ji x be the 
general level and o (1) be the response scaling of the first error component; 
and for the second response let ju 2 be the general level, o (2) be the response 
scaling of the second error component, and r 21 be the response multiple of 
the preceding error component; and for the y>th response let p, v be the 
general level, o ip) be the response scaling of the pth error component, and 
T jji, • • • » T j> 33-1 he the multiples of the preceding error components. A 
realized sequence of errors and the corresponding sequence of response 
values are then connected by the equations 

V\ ~ Ai + 

Vi — P 2 + Tsi^i + o {2) e 2 


Vv - Mv + T vi e i H + -r vv -ie v - x + <7 (j>) e P , 

f The remainder of this chapter may be omitted for a first reading: the methods of the 
regression model are used to construct a multiple response model, the progression model; 
the progression model has a limited range of applications but its notation and results are 
needed for the more useful multivariate model in Chapter Five. The material here can be 
read as preliminary materia] for Sections 4, 5, 6 in Chapter Five. 
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n n n 


Ylf{e li ,...,e v m de u--Ii de ^ 

1 11 



The model has two parts: an error distribution with e x , . . . , e as variables; 
and a structural equation in which realized errors ®i> • • • » e p etermin 
relation between the unknown system characteristics and the known response 

observations. 

The notation can be made more flexible by letting 




§9 
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designate the sequence of response vectors with appended 1-vector, by letting 


designate the sequence of error vectors with appended 1-vector, by letting 


1 

0 • • • 

0 

Ei 

°u> 



t 21 °( 2 ) 

• 

Ed 

T i»i : ' ■ 

T v J>— 1 °lj>) 


designate the quantity describing the system characteristics, and by- letting 


m dE = n/(« r o n • ■ • n*„. 

1 1 i 

designate the error distribution. The progression model can then be 
written 

f{E) dE, 

Y=6E. 

The transformation 6 is an element of the location-progression group 


G = { g 


<U 

koi Cn 


- co < a < oo I 
• co < k < oo \, 
0 < c < co 


a 


c. 
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with group properties 


% K x 




-K- 1 a K - 1 


The component matrices K form the progression group G 2 under matrix 
multiplication: The product of two positive lower triangular matrices is 
positive lower triangular; the inverse of a positive lower triangular matrix 


k 21 k 22 


K- 1 = 


is positive lower triangular (the elements of the inverse can be calculated 
successively from top row to bottom row, from right to left in each row). 
The progression model can then be written in the alternative location-sca e 

form /(£) dE 

Y=[\l,TS]E. 

The notation [a, K) designates the general location-scale transformation 
(Problem 27, Chapter One): 


l y >J 

[a, K]Y = al' + KY; 


note that 


a, • • ■ 


(!>•••>!) 


Q-m ' ’ * 
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and note that the transformation applies column by column to the matrix Y 
The orbit of a point 7 is a set G Y = {gY: g e G} in IP". The orbit can be 
examined alternatively in R n by considering a point Y in R pn as a sequence 
yi ’ • • • ’ of P points in R n . A transformation g has the following effect: 
3T a x 1 + Cl y l5 

y 2 -> a 2 l + k 21 y t + c 2 y 2 , 



y» "*■ o v l + k vl y l + ■■■ + k P „_ 1 y I) _ 1 + c p y p . 

The effect of the group on y x is that of the regression-scale group with 
structural vector 1; the effect on y 2 is that of the regression-scale group 
with structural vectors 1, y x ; the effect on y p is that of the regression-scale 
group with structural vectors 1, y x , . . . , y^. Suppose now that n > p + 1, 
and 1, y u . . . , y P are linearly independent. The orbit of the first point in the 
sequence is L + ( 1; y x ), of the second point is I+( 1, y x ; y 2 ), . . . and of the 
pth point is L+(l,y 1 ,...,y v _ 1 ;y p ): 

G-y 1 =.L+(l;y 1 ) 


G Yp 7 + (l, y x , . . . , y p _i', y^). 

The vectors y u . . . ,y v added successively to the 1-vector generate 2, . . . , 
p + 1 dimensional spaces; the orbits for the successive elements in the 
sequence y x , . . . ,y p are the positive halves of these spaces (see Figure 12). 

Suppose a transformation g carries y 1} ... ,y p into y x , ... ,y P . The 
assumption that 1 , y x , . . . , y p are linearly independent ensures that g is the 
only such transformation. It follows that G is unitary (linearly dependent 
sequences excluded); and it follows then that the progression model is a 
structural model. 

10 A TRANSFORMATION VARIABLE 

The transformation variable for the regression-scale group can be used 
to construct a transformation variable for the location-progression group. 

For the first response y x let m 1 ( 7) be the regression coefficient of y 1 on 1, 
let s (1) (7) be the residual length, and let d x (7) be the unit residual vector: 

yx = m x (7)l + %,(7)d 1 (7). 

The unit residual vector d x (7) is a fixed vector in Z+( 1; y x ). 
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note that 


T(Y)D(Y) = Y- m(Y)l' 


2/11-2/1 " ■ - y 1 


y v \ Vp vpn 2/5 


The progression model can now be written. 

/(£) dE, 

[Y] = 0[E], D(Y)=D(E). 

The structural equation, conditional on the orbit is [Y] = 6[E]\ it has the 
form 

2/i — Ei + 

s u) (Y)= o w s {l) (E); 

y 2 = E 2 + r 21«l + ^(S} e 2> 

t 21 (Y) - r 21 5(1)(£) + cr (2) / 2 i(£)> 

s w (Y)=* <y i2) s {2) (E); 


y p = ^ + +■•• + T, j) 1 V! + 

. l( y) = r rt j (1) (£) + • • • + •+ 


In the alternative notation the progression model can be written |g 

f(E)dE, 

[m(n, nm = tp, -Biwfo, noi, pm = pm- 

The structural equation conditional on the orbit can then be separated into^g 
two component equations : 

m(T) = p. ~f~ T>m(£) , — <-.» 

T{Y) = ^T{E). 
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11 THE REDUCED .MODEL 

Consider first the invariant differential on the sample space. A transforma- 
tion g operates column-by-column on Y. Its effect on the zth column is 


a + K 


which has Jacobian |X| = c x - • • c v . Hence 
Jpnig, 10-W- 

J, n {Y) = in 7)1" - |[7]|« - (%)(7) ■ • • s {p) (Y)Y, 

dm(Y ) = A dy - ji _ . dY 

s a)(Y) ■ • • s” ,(7) |[F]|»* 

Now consider the invariant differentials on the group: 


*2 /v 21 c 2 


p—i ^ 


*2 /v 21 °2 


Sip— 1 S I ( kpl 


The left transformation operates column-by-column : 

J = ( c i ' ' ‘ S)( c i • ■ • S)(c 2 • • • c„) • ■ • (cj = c\ • • • c£ +1 , 
J(g) = cl---c^ = \g\ A , 
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where |a, r U designates the increasing determinant , 


12 . Je . 

^11^22 " ' ®kk’ 


nj-i a k 2 ' 




^2 . . . ^3)+l 
^1 


The right transformation operates row-by-row : 

j* = (lc*)(lc*c*) • • • (lc* • • • 4) 
J\g) = clef 1 ■ ■ ■ cl = |glv, 
where \a jy \ v designates the decreasing determinant, 


^11^22 ' " ^kk > 


a kl a k2 ' ' ' a kk IV 


TI d a, n dk ir n dc i = 


The modular function is 


r P ... c 1 
L 1 


A(g) = 


Iglv _ c i ’ ' ' 


“ V5/ IgU 

n.l General Distributions. The conditional probability element for the 
error position [E] given the orbit D(E) = D is 


g([£] : D) d[E] = k(D)f([E]D) |[E 1 ! 


. d[E] 


k{D)Uf 


s^---s^d[E]. 


The Reduced Model 


The structural probability element for 6 given Y is 
g*(9:Y) dd = E(D(7))/(0- 1 y) El! Hilly dv(6) 

ir imu 


*(*>00)11/ b^r 1 


= Kw)n/ -6- 1 


imrimiv 

ir imi A ‘ 


fruoo • • • sUY) \ n 

\ o- (1 ) • • ■ a (p) / 


. S <»( Y ) ' ' ' sL(E) . dp dlS 
sUY) - • • sffi(Y) ‘ O', • • • fff,, * 

11.2 The structural equation for the progression model, 

m(7) = p. + T5m(£), 

T(Y) = ET(E), 

can be separated into a part concerning ^ and a part concerning G: 

r-i(y)(m(y) - p) = T~\E)m(E) = t(E), 

E^T(Y) = r(£). 

In a related manner the location-progression group has a location subgroup 
L — {[a, /]: a e^} 

and a scale subgroup 

S = {[0,K]: KeG 2 }; 

note that S and G 2 are the same group but differ in designation and in the 
spaces to which they can be applied by matrix multiplication. 

Hk/'h , L ° Cat ;°" Distributions - For inference concerning p, the marginal 
dBtnbuUon of the error variable t = t(£) = (,,(£), . . . , , r(£) y is ne ,f ded . 

I he full error probability distribution is 

g([E]:D) d[E] = k(D)f([E]D) \[E]\ n -®L 

m\ A 

= k(D)f([E]D)\T\ n ^- dT • 

\T\ \T\ a 


g([E]:D) d[E ] = k(D)f([E}D) \[E]\ T 


= KD)f{[E]D)\T\ 
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the transformation m = Tt from t to m for freed T has Jacobian \T\; the 
marginal distribution of t is obtained by substituting for m and integrating 
out T: 


g L (t: D) dt = k(D)£/([E] D) |T|* ^ • it 


h + d ii 


k(D) f n / 

JT 1 


m-— -it 

|TU 


\ (_ * j > + dpi J / 

The error variable t = t(£) indexes the orbits (right cosets) of the scale 
group S on G* : 

S[m,T] = S[0,T][t,I] = S[t,I]. 

11 4 Scale Distributions. For inference concerning 15 the marginal 
distribution of the error variable T = T{E) is needed. The marginal distri- 
bution is obtained from the full error distribution by integrating out m. 

C dT 

g s (T:D)dT = k{D)jj{[E]D)dm-\T\“^^ 


— k.(D) ■ n / 
Jm 1 


dm ■ ■ ■ ■ s^- 1 dT. 


The error variable T = T{E ) indexes the orbits (right cosets) of the location 
group L on G*: 

L[m, T ] = L[m, J][0, T] =* L[0, 7], 

12 WITH NORMAL ERROR 

Consider the progression model v/ith standard normal error variables . 

/(.£) dE = (27r)- np/2 exp (-1 £4) A IT de n> ^ 

[Y] = d[E], D{Y) = D(E). 

The sum of squares in the exponent of the normal density function can be ; 
expressed in matrix notation : 'M 

i4=iM 5 =i>..<o .. I 

i,i #-l ’ =1 . ^ 

= tr EE' 

= tr EE' — n. 
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The notation tr S designates the trace of a square matrix. 


3 11 ^1# 




Note that tr ABC = tr BCA = tr CAB , — provided the matrix operations 
are permissible. 

The sum of squares can be expressed in terms of the position variable: 
2 e % ~ tr EE' — n 

— tr [E]D(E)D'(E)[E] r — n 


= tr [E] 


[E]' - n 


= tr [£][£]' - n. 


IE] = [E] 


yjn 0 
sfnm{E) T(E ) 


%/« o 

4n e x s a) (E) 

^2 Ci(E) s (2 )(E) 


■sfne v t vl {E) 


s {v) (E) 


The row vectors in [£] describe the row vectors in E, but relative to basis 
vectors given by the rows of D{E) (compare with the concluding paragraph 
of Section 6); the rows of D{E) form an orthonormal set, except for the 



.. j, j f itiree 
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Za^Z equal , EE' = [JW. The trace of EE' gives the sum of squares 
of elements of E; hence 

24= tr [EPl'-« 

= n tr m(£)m'(JS) + T{E)T (E) 

12.1 General Distribution: Error. The error probability distribution is 

g ([E]: D).d[E] = fe(0)(2’r)-’* ,! exp {-K«r - ")i W [ [f ^T 

= ic'(D) exp {-K2 + 2 •»■ + 2 S f.->>K" ■ ' • rf[E1 

_ ^-n-i ‘ ‘ ‘ ex p {— 1 2 — i 2 4' — 4 2 s u>} 

(27t)" W2 _ 

• • • • j&t 1 n n ** n dt n n *«>• 

. r 0 t are independent standard normal variables , 
The error components V» ^ ^ ien , chi-variables on n - 1, 

,h€ e "°L of fdedod, the error distribution does no, depend on «*, 

° f The efror"diftribution can be described by the equation 

' Tn 0 ■■■ O' 

yjnex 5(1) 

Vn e 2 ^21 ^(2) 


V« e* 


V n 0 
^1 Xn— 1 


Z 2 i Xn— 2 


Zj, Zj,! 


Zjj p — 1 Xn Jf 


§/2 


P7r/i Norma/ Error 
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where the z’s are independent standard normal variables and the %’s are 
chi-variables with degrees of freedom as subscribed. 

12.2 General Distribution: Structural. The structural distribution for the 
quantity Q can be obtained from the general expression in the preceding 
section. The matrix form for the sum of squares in the exponent and the 
substitution E — Q~~ x Y give 

g , (e . y) de = exp { _ i(tr g-tyy.p-l _ „ )} 

(ZTT) 

. MY) • • - sUY) \ n sUY) • • ■ sUY) n^dp d75 

\ <%> ■ • • / 5u,(y) ■ • ■ 4 + >W <!»••• 4,) • 

The location and scale components in the exponential can be expressed 
in terms of the quantities p, 75. The formulas 

m(Y) = p + 75m, 

T(Y ) = 75 T 

can be used in the expression for the sum of squares of the error variables : 
2 4 = tr EE' - n 

- n tr mm' + tr TT' 

- n tr 75~ 1 (m(Y) - p)(m(Y) - p)^'" 1 + tr 75- 1 T(Y)T / (Y)75'- 1 

- n(m(Y) - p.)'(7575T x (m(Y) - p) + tr (7575')- 1 (T(Y)T'(Y)) 

= n(m(Y) - p)'S- 1 (m(Y) - p) + trE^Y). 

The inner-product matrix 7575' is designated 

E = 7575', 

and the inner-product matrix T(Y)T'(Y ) is designated 
S( Y) = T{ Y)T'( Y) = T( Y)D{ Y)Z>'( Y)T\ Y) 



Note that S( Y) is the inner product matrix for the response deviation vectors 

y i ~ Vi 1 = (dn ~ ■ ■ • , y in - Vi)'. 


fl 
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The structural distribution can now be written: 


g*(6:Y)dd 

= A ' ' ' *2=2 exp (_i(m(y) - [i)'nlT\m(Yy - p) - itrS - 1 S(y)} 

(27 r) np/2 

(sUY) • • • sUY)\ n sUlti: sljjY) 

* / s a>(^) • • ■ «u> ‘ - ' °<*> 

Note that the conditional distribution of p. given 75 is normal with mean 
m( y) and covariance matrix »r l S; the marginal distribution for 75 can be 
described in terms of normal and chi-variables by the relation 

T(Y) =* 7 5T(£): 


Z 21 Xn — 2 


75 = T(Y ) 


z pl ' Z J> P — 1 Xn— j) J 

12.3 The Error Inner-Product Distribution. The error scale matrix 
T(E) has a distribution described by 

r »(.) 0 7 f *— 0 1 


^21 *V< 2 ) 


T(E) = 


Z 21 Xn— 2 


‘ JJ P— - 1 P) . 


with probability element 


^m-l ‘ ' ~ jjn- 

( 27 r ) ( n-l>P /2 


^ — i — jj __ 


in-tySr 1 exp 2 4’ - i 2 sfi)Ki) 2 ■ ' ' s o>) dT - 


A closely related matrix is the erfor inner-product matrix: 
S(E) = T(E)T'(E) = T(E)D(E)iy(E)TXE) 


e n - fix 


e ln e n “ ‘ 
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The Jacobian matrix is lower-triangular, and the Jacobian de 
then the product of the diagonal elements (the diagonal elements 
in the right margin) : 


Three 1 

the Jacobian determinant is | 
iagonal elements are recorded 


— 2t 11 (t 11 2t 22 ) • • • (hi^22 ‘ ' ‘ 2tpj)) 
dl 



= 2 V |T|v. 

The probability element for S is obtained by substitution : 

= W^ exp{ “ itrS} 

Aji— 1 f i | OI (n— 1)/2 dS 

_ (2v) {n ~ 1)p/2 e p ■ 2 tr <S} |S1 2 J, j S | (j)+ l )/2 

A n _i ' ' * A n _,„ f - , , I I oi(n— 1— 

= (27r) (n^I)g/» CX P {“*(*11 + • • • + W l S l 

dsj_j_(ds 2 1 ds 22 ) * dSpyi) 

2® ' 

The density applies to all points 5’ for which the matrix S is positive definite. 
The distribution of S is the standard Wishart distribution. 
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The regression model as developed in this chapter can be used to 
generate a corresponding classical model /(y;|J, o). The classical model 
f( y:p, a ) gives the distribution of possible response vectors y based on a ; 
fixed value of the quantity ((J, <r): j 


n/Oi) n d *i = n/ce.)*: - 

i= 1 ' i=l i=l S„ 


n/^ ~ — — ^ rfy = /(y:p, ff) dy. 


The classical regression model with normal error has an extensive literature. - 
Its essential form appears in the work of Legendre, Gauss, and Laplace and is 
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tied closely to the classical methods of least squares. Some current books 
on the topic are Plackett (1960), Scheffe (1959), and Williams (1959). 

The classical model with other error forms has received little attention. 
Classical theory does not produce a set of natural variables to be in corre- 
spondence with the parameters (5, a. And the use of the regression variables 
of normal and least-squares theory leads to the intractable problem of finding 
the marginal distribution of these variables. The little attention that has been 
accorded the nonnormal model has been concerned with examining the 
methods of normal theory as used with an error form that departs modestly 
from normality. 

The structural distribution for ({3, a) was developed in another framework 
as a distribution for ((3, a): Fraser (1961), Verhagen (1961). The development 
here follows closely that in Fraser (1967). 

The progression model also leads to a corresponding classical model. 
For single values on each response variable, the distribution describing 
possible (y u . . . , yj for given p., 75 is 


/ Oi, • > on de * ^ 1 


Vl - /A 


Up Pp 


Yl d Vi 

|75| 


= /(yi. • •• ,y P ; p> ^IT d Vi- 

With standard normal error form the distribution for (y u ... ,y p ) is 

f(y i. .-.,y v -\f, 75) n d Vi 


Vi - Th Vi ~ /«i 


\ph ex P \ 


y v ~ H'p J I yp H'v 


,-i in^, 


isr^ 

(2-nf n 


Vi - /“I 


Vl “ /Ml 


. y» - a, J [ Vp 


the distribution is multivariate normal with mean (p lt . . 
ance matrix S = 7575'. 


IT d Vil 


. , fi p ) and covari- 
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For n values on each response variable, the distribution describing Y is 

f( 7 : p., S) d Y = ^L^exp {-ltr(y - v-lX^iX - p-1')} dY 

- - - (2 IT) 



-n/2 

— exp {-i tr (y - pi')'S' _1 (y - p-i')} d X- 

\«p/2 ~ 


The marginal distribution of m(T) = (y,, .... can be obtained by the 
change of variable m (y) = p + 7Sm from the conditional (also margmal) 

distribution of m = m(£): 

/( m (y):ix,S)dm(y) . 


n^isr 17 * 


?/l - /*1 


2/n - f** 


3/l - Pi 


2/„ - Pj> 


TT d V r 


The marginal distribution of T{ Y) can be obtained by the change of variable 
T(Y) — T>T from the conditional (also marginal) distribution of T 1 {t): 


A n -i A n _ 

(27r) ( '- 1)1,12 


~ exp { — ^ tr TT'} |T| 


,-i 

|TL 


- T2 ^“ p{ - itr ^ ny)r(r) "" l} i 151 ”" 1 |T(r)U 

, a |Sm| (B_1)/2 dT(Y) 

= «P {-* 

= / (T(y) : p-, S) dT(Y). 

The variables m(T) and T(T) are statistically independent. The distribution 
of the inner product matrix S(Y) can be derived by the results in Section 12. 

f(S(Y):[i,Y)dS(Y) 

a ...a , ■> ismi ( "~ 1)/2 rfs(y) 

= -),^ eX P S(7)/ J S j(n-l)/2 2" |S(y)i ( " +1)/2 ‘ 

This is the general Wishart distribution with covariance matrix S and degrees 
of freedom n — 1. 

The progression model with normal error leads to the classical multivariat - 
normal distribution and to related distributions for the sample means and he 
sample covariance matrix. The internal structuring provided here by 
progression model is, however, rather specialized, a more genera y app 
priate structural model relating to the multivariate normal is examined in 

Chapter Five. 


,._ n mT)!"- 1 dT(Y) 


\T{Y)h 
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PROBLEMS 

1. Consider a measuring instrument with error distribution f(e) de and suppose that four 

measurements have been made on a quantity jx^, three measurements on a quantity /.i 2 , and 

three measurements on a quantity /u z : 


n/Mnr de ip 


- 1 

1 

1 

1 

0 

0 

0 

0 

0 

0 " 

0 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

.I/ll 

Viz 

2/13 

2/n 

y%i 

Viz 

1/23 

2/31 

1/32 

2/33^ 



Note that the structural vectors, designated Vj, v 2 , v 3 , are mutually orthogonal. 

(i) Show that the projection of y into the subspace L(\ x , v 2 , v 3 ) is 

^i v i + ^2 v a + ^3 v 3 = (£i> h> Vv vk ff«» yz> y%f yJ 

where y x = ^ 2/1,74, y 2 — 3/2,73, 1/3 = 2 % ,7 3 - This projection gives the location of y 

relative to the subspace L(\ lt v 2 , v 3 ). 

(ii) Check that the residual vector is " 

•Ky)rf(y) = {y xx - y x ,y x2 - y x , y lz - y x , y u - y x , 

y%l ~ 1/2' 1/22 — 2/2' 1/23 — Vz’ 1/31 — 1/32 — 2/3, 1/33 — 2/3) 

and show that the squared residual length is 

j2 (y) = 2 (vn - ?a) 2 + 2 (yv - 7 2 ) 2 + 2 (yv - %) 2 

= 2 y % ~ ( 4 ^1 + 3 y\ + 3 y\) 

= _ r ( 2 yib 2 , (2 y*d 2 , (2 

^ ij L” 4 ” 3 + ‘ 3 • 

Interpret each expression in terms of the geometry. The length s( y) gives the scale of y 
relative to the subspace L(\ x , v 2 , v 3 ). 
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(iii) The position and reference-point decomposition of Y is 

1 0 0 °lf v i^ 

0 10 0 V2 

Y = 0 0 1 o Vg 

^y x y% vz s(y)Jl d 'fr)- 

Check that the reduced structural equation has components 

Vi — Mi + a *i’ 

Vz — P 2 + 

Vz — ^3 + °^3> 
s(y) = «(«)• 

(iv) Show that the positive-lower-triangular and orthogonal factorization of Y is 

C V 4 0 0 0 ~] 


[_V4y x V3y 2 V3 y 3 *(y)J 

r i _ 1 _ j_ i_ 

V4 V 4 V4 4 

0 0 0 0 
0 0 0 0 

y xx - i?! 3/12 - ffi 3/ia - 3/14 - »i 

^(y) ^(y) *(y) *(y> 


y 2 i — y-i 3/22 ~ 3/2 3/23 ~ 3/2 
i(y) j(y) j(y) 

and check that the analysis-of-variance table is 


Vzi ~~ 3/3 3/32 ~ 3/3 3/33 3/3 

s(y) i(y) «(y) _ 


Source 

Dimension 

Component 

Structure of Component 

First 

1 

- 2 4 == ( 2 |ly). 

(fi x V 4 + oe x V 4) 2 

Second 

1 

yp = 

Cu 2 V 3 + ffe 2 V 3) 2 

Third 

1 

yp = 

(/I3V3 + cre 3 V 3 ) 2 


Residual 


7 

To 
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2. {Continuation), (i) Show that the invariant differential for the transformation y = 
m i v i + " ? 2 v 2 + m z y z + °y “ s 

dy 

s w {y) 

as based on Euclidean volume at d(y). For the left and right transformations described by 



show that the left and right invariant differentials on the group are 

dm x dm 2 dm z dc dm x dm 2 dm 2 dc 

c 4 c 

(ii) Show that the error probability distribution for the reduced model is 




V4V3V3 


A 10 (1 + 4/? + 3 1\ + 3/|) 


- dt x dt 2 dt z 



V~3 dt. 2 Ay V3 dt , 


and of the /’ s individually is 

Ay Vi rf/j . _ _____ 

^ (1 + 4/2)4 ’ A s (1 + 3/2)4 ’ ' A b (1 + 3/2)4 • 

(v) For the case of normal error give equations that describe the structural distribution 
of (u, n z a ) in terms of normal and chi variables, the structural distribution of 
(/q, 4’, Ms) in’ terms of simplified /-variables, and the structural distribuhon of <7 m terms 
of a chi variable. 

3. The model in Problems 1 and 2 could apply equally to a process with stable internal 
error f{e ) de and with four response observations under a first set of conditions, three 
response observations under a second set, and three under a third. If the general response 
level is unaffected by change of conditions, the measurement model in Chapter One is 
appropriate. If the general response level is viewed as dependent on the conditions, the model 
in Problem 1 is appropriate. The general response-level vector in the first case lies in 1(1) 
and in the second case, in L(v x , v 2 , v 3 ) with v’s as defined in Problem 1. The change from 
the one-dimensional subspace L(l) to the three-dimensional subspace L(v x , v,, v 3 ) requ.res 
effectively two additional vectors. The two additional vectors could be chosen from 
v lt v 2 , v 3 , or they could be constructed directly with a view toward orthogonality and ease 
of interpretation. 


;'3g1 


c 

1 

1 


1 

1 j 1 

1 

1 | 1 

1 

1 - 







3 

7 

— t 


3 

7 

3 ! ± 

7 J 7 

4 

7 

t j 0 

0 

0 






- 

3 

“10 

3 

10 


3. 

3 j 3 

fo ; "10 

fo 

— h i 

7 

0 

7 

1 0 

7 

1 0 






L 

2/12 


Vl3 

Vu ; Vzi 

Vzz 

Vz3 i 2 / 3 i 

2/ 3 2 

2/33 







r 1 

0 

0 

0^ 


" 1 1 

1 

1 


1 

1 

• 1 

1 

1 

1 " 



0 

1 

0 

0 


T 

T 

3 

7 

3 

7 


4. 

f 


0 

0 

0 

n 


0 

0 

0 

0 


3 3. 

10 10 

-h 

fo 

- 

3 '\ 

“10 

3 

1 O . 

3 

1 0 

fo 

fo 

fo 



^ a i 

«2 

a 3 

0 __ 


^ e ll e 12 

e 13 

e u 


&21 

e 22 

e 23 

e 31 

e 32 

e 33 _> 



Note that the three structural vectors, to be designated w x , w 2 , w 3 , are mutually orthogonal. 
The usuaf conventional has w- vectors obtained by successive orthogonalization of v- vectors. 
This convention does not apply here; the w-vectors are not derived! from the use of two 
v-vectors in addition to the 1-vector. The w-notation is used merely to suggest a constructed 
orthogonal set. 

(i) Verify the interpretations: oq is the average general-response level for the 10 per- 
formances, a 2 is the general level for the second conditions as it exceeds the general level 
for the first conditions, a 3 is the general level for the third conditions as it exceeds the average 
general level for the seven performances with the first two conditions. 

! For comparison, construct an orthogonal set by successively orthogonalizing 1, v 2 , v 3 . 






where 


Problems 

(ii) Show that the projection of y into the subspace L(w x , w 2 , w 3 ) is 

fli(y)wi + a 2 (y)w 2 + a 3 (y)w 3 = (y x ,y x , y x , y x , y 2 , y 2 , y 2 , y 3 , y 3 , y 3 ), 


«i(y) 


(Wj, y) 

(w lf w x ) 

he 

10 : 


a 2 (y) 


■ y) 

( W 2> W 2 ) 

: Vz ~ Vi, 


« 3 (y) 


• ^ w 3. y) 

0 3 > w 3 ) 

_ 4 y x + 3 y 2 

. 


Check that the squared-residual-length is 

s2 ( y ) = 2 ivi i - ^r) 2 + 2 (y* - £ 2 ) 2 + 2 v 3 

-I<- '°s s - - Si ) 2 - - 

The reduced structural equation has components 


-%) 2 
4 V x + 3 y z 


a i(y) = 

a x + aa x {€). 

« 2 (y) = 

a 2 + <r« 2 (e), 

a 3 (y) = 

a 3 + oa 3 (e), 

Vy) = 

as(e). 


(iii) Show that the positive-lower-triangular and orthogonal factorization of Y is 

VTo 0 0 0 


0 

o 


V-i 


VlOy VUfl 2 (y) Vfi a 3 (y) s (y) 


" 1 

1 

1 

1 

1 

1 

1 

1 

1 

1 " 

VTo 

V10 

VTo 

VTo 

VTo 

VTo 

V10 

VTo 

VTo 

VTo 

-3 

-3 

-3 

-3 

4 

4 

4 




V84 

V84 

V84 

V84 

V84 

V84 

V§4 

0 

0 

0 

-3 

-3 

-3 

-3 

-3 

-3 

-3 

7 

7 

7 

V210 

V2I0 

V210 

V2T0 

V2T0 

V2T0 

V2T0 

V2T0 

V2T0 

V2T0 

_ dii 


d\3 

d u 

d zi 

d zz — 

d Z3 

d 3l 

d 32 

d 33 _ 


Check that the analysis-of-variance table is 
Source Dimension Component 


Structure of Component 


Residual 


7 

To 


y 2 10 

{yz-yd 2 ¥ 

4y 1 + 3y,X 21 
* 3 7 / 10 

V(y) = 2 (y,i - yd 


(oqVlO + oeVlO) 2 
(cc 2 V±± + o(e 2 - e x )V^f 


[- 4y 1 + 3y,,y 21 / /21 t /. 4e, + 3e ? \ /2l\ 5 

l* — j 10 IV To + i e °~ - V^jVlo) 


7 

(ffi(e)) 2 


2 y% 
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Note that the first three components in Problem 1 have, in effect, been combined and new 
first components formed: one set of orthogonal axes in Z.(v 1 ,v 2 ,v 3 ) = £(w 1 ,w 2 ,w 3 ) has • 
been replaced by another set of orthogonal axes. 

4. (< Continuation ). (i) Check that the usual error probability distributions for the reduced 
model are V 

fc(d) ! 1 + a 2 w 2 .ij + a 3 w 3 ii + sc ^ij) s V P&iff 

I* oo _ . ■ 

Ar(d) U f(s^ V'bti + h w 3 H "t" dij )) s9 ds ' dt. 


>jfl™ 


■ a i w Ui a 2 w 2 H a 3 w 3 ij "l" sd^da • s ds. 


(ii) For the case of normal error 

n/(e i3 ) = (2rr)- 5 exp {-*24-} 

1 

verify that the reduced model can be represented as 

j> z l> z 2> z 3’ %7’ 

a x { y)VTo = a-jV 10 + az x , 
Vy) 1 * 7 ¥• = “2^ ¥■ + 

V y)Vtt = + az 3 , 

s{ y) = ff% 7 - 

Also verify that the error probability distribution of (f lt t 2 , t 3 ) is 

A? VIo Vnp ^21/To di di 

A w (1 + 10 if +Tl2/7)f| + (21/I0)?f) 5 1 


and of f 3 , for example, is 


Ay V21/10 ^ 

^ (l + (21/10)f|) 4 f3 ' 


(iii) For the case of normal error give equations that describe the structural distributions 
of (a x , a 2 , a 3 , a), (a x , a 2 , a 3 ), and o (use normal, chi, and simplified f-variables). 

5. Consider a process with stable error f(e)de and a response y whose general level is known 
to depend linearly on a controllable variable *. For five response observations the model 
would be 

n/(*«>n ^ 


11111 


x l x 2 x 3 X i X 5 


1 0 0]fl 1 1 1 1 

0 t 0 x x X 2 x 3 x i X 5 . 


0 [ e, e 2 e 3 e 4 e s 


Vi 2/2 2/3 Vi 2 / 5 . 


Problems 


165 


(i) Show that the projection of y into the subspace L( 1, x) is 6 0 (y)l + b 1 (y)x, where 

■ , 2 (*i - 2 & ~ £ )(Vi - y) 

2 ( x i ~ x f 2 ~ 

Vy) — y- b i(y) x ‘> 

and show that the squared residual length is 

Vy) = 2 fa ~ Vy) - b i(y) x i) 2 
= 2 (vi-v - b i(y)( x i - V 2 

= y v 2 _ ( 2 v 2 _ (2 v - *)4) 2 

Z i « 2 (*< - *) 2 ‘ 

(ii) The reduced structural equation has components 

Vy) = i?o + °V e ) 

b i(y) + Px = V(e) 

■s(y) = <xs(e). 

Give expressions for the error probability distribution of (6 0 (e), 6 x (e), 5 (e)), (f 0 (e), t x (e)), 
and 5 (e). 

(iii) For the case of normal error 

IT /(V = exp {-2 4 

1 

record the error probability elements for (V e ), V e ), $(«)). (V e )> ^(e)), and 5 (e). (The 
quadratic expressions have cross-product terms: the general-case nonorthogonality of 1 
and x.) 

6 . ( Continuation ). Consider the preceding model in orthogonal form 



(i) Express the a’s in terms of the j8’s with P~ x and the /?’s in terms of the a’s with P. 

(ii) Show that the projection of y into the subspace i(w 1 , w 2 ) is 

Vy) 1 + Vy)(* - *!). 


Vy) = y 

( = 2 V ~ f _ 2 V - fXy«: - y) 
1 y 2 - *) 2 2 ~ *) 2 


where 
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Also verify that the error probability distribution of (/ 0 (e), ^(e)) is 
A 3 (* f - sj* 


A 5 (1 + 5/| + 2 ( x i — d* 1 


and of ^(e), for example, is 


A s 

A i 0 + £( x i - xflff tv 


(vi) For the case of normal error give equations that describe the structural distributions 
of (a 0 , a x , o), (a 0 , ccj), a x , and cr (use normal, chi, and simplified /-variables). 

Note: On the assumption that there is no dependence on the controllable variable, the 
measurement model of Chapter One is appropriate. On the assumption that the dependence 
is linear , the type of model in Problems 5 and 6 is appropriate. Now suppose that several 
observations are taken at each level of the controllable variable. On the assumption that the 
dependence is of unknown form, the type of model in Problems 1, 2, 3, and 4 is appropriate. 
An analysis-of-variance table for such a succession of models can be calculated by using the 
results from the various problems; tests and structural distributions can be obtained from a 
particular model. For examples, see Problems 13, 20, 21, and 23. 

7. Consider a process with a known error distribution and suppose that 12 observations 
are taken, two at each combination of levels A v A 2 for a first factor A affecting the process 
and levels B 1 ,B 2 , B 3 of a second factor B affecting the process: 


A 1 y 111 2Aia y 121 ^122 ^131 2/l32 

A z y^w 2/212 2/221 2/222 2/231 2/232 

If the factors were known not to affect the general response level, the measurement model 
in Chapter One would be appropriate. 

If the factor B were known not to affect the general response level, the kind of model in 
Problems 1, 2, 3, 4 would be applicable (two levels for the factor A and, accordingly, two 
structural vectors): 

TT/(%s)TT d e ijs’ 


" 1 

1 

1 

1 i 1 

1 ! 1 

1 

1 

1 

1 

1 - 

-1 

-1 

-1 

-1 1 -1 

— 1 j 1 

1 

1 

1 

1 

1 

.2/m 

2/112 

2/121 

2/122 i 2/l31 2/132 j 2/211 

2/212 

2/221 

2/222 

2/231 

2/232. 


1 0 0 
0 1 0 


Similarly, if the factor A were known not to affect the general response level, the kind of 
model in Problems 1, 2, 3, and 4 would be applicable (three levels for the factor B and. 
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accordingly, three structural vectors): 


- 1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 " 

-1 

-1 

1 

1 

0 

0 

-1 

-1 

1 

1 

0 

0 

-1 

-1 

-1 

-1 

2 

2 

-1 

-1 

-1 

-1 

2 

2 

^2/111 

VllZ 

2/121 

2/l22 

2/l31 

2/132 

2/211 

2/212 

2/221 

2/222 

2/231 

2/232 _ 


r 1 <n r i ••• i " 

oi - i • • • o 

_ o o i — l • • • 2 

M Vl Vz °J [/ill ' ‘ ' e 232 J 

Note that the lengths of the structural vectors have been chosen to avoid fractions. 

More generally, perhaps the factors are known to have possible effects on the general 
response level but only in an additive manner: a change in factor A changes the gen- 
eral level by the same amount at each level of B and a change in factor B changes the 
general level by the same amount at each level of A. A combination of the preceding two 
models is then appropriate. 

UfMTl de iu 


r 1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 ->| 

-1 

-1 

-1 

-1 

-1 

-1 

1 

1 

1 

1 

1 

1 

-1 

-1 

1 

1 

0 

0 

-1 

-1 

1 

1 

0 

0 

-1 

-1 

-1 

-1 

2 

2 

-1 

-1 

-1 

-1 

2 

2 . 

s _ 2 /n i 

2/112 

2/121 

2/122 

2 / l 31 

2/132 

2/211 

2/212 

2/221 

2/222 

2/231 

2 / 232 - 



r 1 0 A r 1 ••• 1 - 

oi -i • • • i 

= 001. -1 • • • 0 

0 0 0 1 -1 • • • 2 

^M- p 7i V-2 ° J L e m ' ' ‘ ^232 - 

Note that the four structural vectors are mutually orthogonal: m particular, the second 
vector describing row differences is orthogonal to the third and fourth vectors describing 

column differences. , 

And more generally still, if the factors are known to have possible effects on the gene 

response level unrestricted by additivity, the kind of model in Problems 1 and 2 would be 
applicable with six general levels given by the table 

B x Bo S 3 

Ay Mil Ml 2 Ml3 

A g M 21 M22 M 23 
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and with a corresponding six orthogonal structural vectors. Alternatively, the model can be 
. structured by adding two vectors to the four structural vectors in the additive model : 

umm de us’ 


" 1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

r 

-1 

-1 

-1 

-1 

-1 

-1 

1 

1 

1 

1 

1 

1 

-1 

-1 

1 

1 

0 

0 

-1 

-1 

1 

1 

0 

0 

-1 

-1 

-1 

-1 

2 

2 

-1 

-1 

-1 

-1 

2 

2 

1 

1 

-1 

-1 

0 

0 

-1 

-1 

1 

1 

0 

0 

1 

1 

1 

1 

-2 

-2 

-1 

-1 

-1 

-1 

2 

2 

^2/m 

2/112 

2/121 

2/122 

2/l31 

2/132 

2/211 

J/212 

2/221 

2/222 

2/231 

2/232 _ 


1 0 1 • • • 1 

01 -i--- 1 

001 -1 • • • 0 

= 0001 — 1 • • • 2 

00001 1 ■ • • 0 

000001 1 - • • 2 

^M P Vl Vz a l a 2 ° - e in ' ■ ‘ e 232 _ 

(i) Check the mutual orthogonality of the structural vectors in the final model. The 
fifth vector is obtained by multiplying corresponding elements in the second and third 
vectors, the sixth vector, by multiplying corresponding elements in the second and fourth 
vectors. Justify this procedure in terms of the interpretation of vectors as describing row 
differences and column differences. 

(ii) The projection of y into the six dimensional subspace determined by the structural 
vectors w x , . . . , w 6 in the final model can be written 

my/ 1 + rw 2 + CjW 3 + c 2 w 4 -f a x w 5 + a 2 w 6 . 

Give expressions for the coefficients in terms of averages such as 

Z » u . 2 

*-~v- 

(iii) Determine the positive-lower-triangular and orthogonal factorization of the matrix 
Y; use the notation m( y), . . . , a 2 (y). Record the analysis-of-variance table. 

8. ( Continuation ). For the case of normal error 

TT/^iis) = o >~ 6 ex P 

express the reduced model in terms of normal and chi variables (cf. Problems 4 and 6). 

9. Eleven pieces of material were sampled at random from a lot; five, chosen at random, 
were subjected to a first treatment, and the remaining six to a second treatment. Suppose the 




170 Linear Models Three 

regression model with normal error is applicable. Let p l be the level for the first 
treatment and p 2 for the second treatment; and suppose the observations yield 



y x = 0-275, y 2 = 0.293, 
s 2 Vi = 0.00045, s% 2 = 0.00039. 

Derive the structural distribution for p 2 — p x (note that variances are recorded). 

10. Show that the location group Z, is a normal subgroup of the regression-scale group G 
(Section 7). 

11. Measurements are made to estimate a period of oscillation. Let y 0 be the measured time 
when the oscillation is in a certain phase, and y lt . . . ,y n be the measured times for 
successive recurrence to that phase. Suppose the measurement error is normal: then the 
regression model with equation 

Vi = P 1 + (V + ae t 

is applicable; /9 2 is the period of oscillation. Show that the structural distribution for /3 2 is 
located at 

„ ^Vid-n/l) 

^ I(/-«/2) 2 

and has /-form on n — 1 degrees of freedom ; obtain an expression for the scaling of the 
/-distribution. 

12. Consider the regression model with normal error, with structural relationship 

y = P x i + Pz x + /V 2 + ae > 

and observations 

* 2.6 2.7 2.8 2.9 3.1 

y 12.1 12.5 12.7 13.0 13.5. 

on the response y corresponding to the controllable variable x. 

(i) Test the hypothesis /? 3 = 0. 

(ii) On the assumption that /? 3 = 0 derive the structural distribution for /? 2 . 

13 {Continuation). Two determinations of the response y were made at each of five levels of 
a controllable variable x: 


X 2.6 

2.7 

2.8 

2.9 

3.1 

y 12.2 

12.5 

12.5 

13.1 

13.5 

12.0 

12.5 

12.9 

12.9 

13.5. 


Suppose the regression model is applicable: normal error and response levels p v 
p 2 , 7*3 , ,m 4 , p 5 for the five levels of controllable variable. Calculate the analysis-of-variance 
table with entries for: mean; linear dependence on x ; quadratic dependence on x; other 
dependence on x; residual. Test the hypothesis that the dependence is at most quadratic. 

14. Use the orthogonal basis to show that 

r r+1 

|y - I^ r, (y)v„l 2 = |y - 2 e w (y)vJ 2 + WAy) K+il) 2 - 

i i 

15. Use the orthogonal basis to show that 

r r r+1 r+1 

<yi - I b'->(yi)y u , y 2 - 2 b«K y 2 )v„) = (y x - 2 6 ( J +1, ( ViK- y 2 - 2 

l l l l 

+ ^r+ f l 1, (yi) fe r+I 1) (y2)( w r+l> W r+l)' 
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16. Show that 

r T 

(yi - 2 b u( yiK> y 2 - 2 b ui y 2 )v u ) = (y x , y 2 ) - 2 *«(yiXv tt , y 3 ) 

1 1 u 

= (yi, y 2 ) - 2 b u(y 2 )( y «. yi), 

|y - 2 & «(y)y«l 2 = bl 2 - 2 6 «(y)(v«. y). 

Yi. Consider the regression model but with known error scaling: 

n n 

f{E)dE — rf/(Cj) n de iy 
l 1 

Y = QE, 

where 



the simple regression model. Assume that Vj v r are linearly independent and d is an 

element of the regression group 



(i) Determine a transformation variable and a reference point (Notation of Section 3 ; 
pattern of Section 3 in Chapter One). Check that the model is a structural model. 

(ii) Derive the invariant differentials and the modularfunction. 

(iii) Derive the distribution for error position and the structural distribution for 8. 

18 ( Continuation ). Consider the simple regression model with normal component error: 


f{e) de = —7= — exp 
Vl-rr CT 0 



Derive the distribution of error position: (a) using an orthogonal basis vtq, . . . , w r , and 
(b) using the given basis v x , . . . , v r , where 

W = PV, V = P~ X W. 
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19. Suppose levels A L , . . . , A a are chosen for a factor A. Let y is be the sth response at level 
Af. 

Vln, 1 CD 


Vn 

2/21 

Val 


y 2 n 2 


Van „ ( fl ) 


( 2 ) 


= (Vis)’ 


the vector is recorded in a convenient array with different rows containing responses for 
different levels; N = ]£ n i> y e RN ' Let v ° be the 1 ' vector: 

1 • • • 1 
1 • • • 1 



and let v ; be an indicator vector for the level A i : 

r 0 ••• O' 


(1) 

0 - 1 ) 
(0 • 
a + 1 ) 




_ • • • 0 J (a) 

(i) In the table check the following entries: ordering- of subspaces by “contained in” 
(L(v 0 ) c L(v v . . . , v a ) cr rH); dimension of subspace; projection into subspace; squared 
length of projection. 

Space Dimension Projection Squared Length 


Mean £(%) 

1 

(y) 

Ny 2 = 

|(2> s m| 

Eiii 

Factor A L(y x , . . . , v a ) 

a 

(&.) 

2 wl 


i s (H 

Blr ■■ 



1 i n i 

1 ~ \ 

r n 

N 

(2 tis) 

2y% 



..V. jgj 

Notation, y — T y is lN\ y u 

= 2 yJ n i- 
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(ii) Justify the elements in the corresponding analysis-of-variance table (regression model 
with normal error). 


Source 

Dimension 

Component 

Structure of Component 

Mean 

1 

Ny 2 

i^Np + az Q f 

Factor A 

a — 1 

5XC- N y 2 

= 2 n i { Ni- - y? 

- pf + ozj) 2 + ( 

Residual 

N-a 

2 27 is - 2 n Nl 

= 2 ( 2 lis - Vi? 

a2 *N-a 

Total 

N 

2 y% 



Notation, p — S n i p i jN\ z’s are independent standard normal, x’s are independent 
chi-variables with degrees of freedom as subscribed. 

20. Determinations were made on the yield using three methods of catalyzing a chemical 
process : 

I 47.2 49.8 48.5 48.7 

II 50.1 49.3 51.5 50.9 

III 49.1 53.2 51.2 52.8 52.3. 

Suppose that the regression model is applicable: normal error and levels p v p 3 for 
the three methods. 

(i) Calculate the analysis-of-variance table by using the enboxed-expressions in Problem 
19. 

(ii) Derive the structural distribution for p 3 - (/q + j u 2 )/2; for p % - p x . 

21. Three chemists do three, five, four determinations on the che mi cal content of a mixture: 


I 2.6 2.9 2.8 

II 3.1 3.0 3.3 2.9 3.3 

in 2.9 3.2 3.0 3.1 

Suppose that the regression model is applicable : normaLerror and levels p x , p 3 , p 3 for 
the three chemists. Calculate the analysis-of-variance table and test the hypothesis p x = 
7*2 = Az (the three chemists are consistent in their measurement levels). 

22. Suppose levels A x , . . . , A a are chosen for a factor A, and levels B x , . . . , B b for a factor 
B. Let y ijs be the rth response at level A i for A and level B, for B: 


y 111 ' ■ ' 2/nn 


2(t&l ' ' ' Vlbn 


( 2 hjs)’ 


27 all Vain ’ ' ’ Va 61 ' ’ ' Vabn 
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(ii) In the table check the following entries: dimension of subspace; projection into 

subspace; squared length of projection. 


S P ac e Dimension Projection Squared Length 



Notation, y... - Z g t „ = X^Jbn; y u . = Z, 

(iii) Justify the elements in the corresponding analysis-of-variance table (regression model 
with normal error). b 


o _ . Structure of 

Source Dimension Component Component 

Mean 1 "9?.. (vT7„ + .v> 

A a — \ bn ^ y\.. — Ny 2 

i 

- bn 2 (&••• - v? 

i 

B b — 1 an 2 y? } . — Ny 2 

i 

= an 2 (y-r - Z/) 2 
i 

71 X 5 ( a — l)(b — 1) n 2 Vij' — (precedirig"entries) 

ti 

= n 2 ( va • - + ^> 2 

ii 

Residual 0 ^(n - 1) Z/ 2 ,-,, - (preceding entries) ff2 Za 6 (n-i) 

= 2 (y as ~ y<P 2 

abn I Viis 

(iv) Derive expressions for the missing entries under “Structure of Component.” 

23. A factor A (temperature) is given three levels; a factor B (pressure) is given two levels; 
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two determinations are made at each combination of levels. 

B 



18 

20 


14.2 

18.3 


14.3 

18.0 


17.4 

21.4 


17.6 

21.0 ' 

n 

20.7 

23.8 


20.4 

23. -9 



Suppose the regression model is applicable: normal error and level for the com- 
bination A t Bj. 

(i) Calculate the analysis-of-variance table by using the enboxed expressions of Problem 

22 (ii) Test the hypothesis that the effects of factors A and B are additive : n u = P + 8 i- + 8 -i 
(i.e., no interaction A x B for factors A and B). 

Notation. 

<V = Pi- — P> Pi- ~ 2 Pal 8 ’ 

3 /t = 2 Pni ab - 

d.j = /r.j — /r fi.j — 2 Piil a ’ 1 

i 

*24. Suppose levels A x , . . . , are chosen for a factor A, levels B x . . . , B, for a factor S, 
levels c j . .. , C c for a factor C. Let y ijks be the sth response at the combination A &C k 
5 = 1 n Let v 000 be the 1-vector; let v j00 , v 0j0 , v mk , v i30 , v iofc , v oifc , v ijfc 

vectors for A it B jt C k , A t B it A t C k , B,C k , A&C,, respectively. 

L(v 0 oo) 


-k({ v OO)J) 


L«W) L{{v oj o)) 



L{{v ij0 }) L({v i0k }) 


Ui'foik}) 


L({y ijk }) 


r n 

Problem 24 
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(i) Check the ordering -*• by “contained-in” ; check the orthogonality of extensions 
(see the accompanying figure). 

(ii) In the table check the following entries : dimension, projection. Derive two expressions 
for each squared length. 



Space 

Dimension 

Projection 

Squared Length 

Mean 


1 

( y ■■■■) 


A 


a 

<37*---) 


B 

i«W) 

b 

(y.j..) 


C 

i({v 00 fc}) 

c 

(y-k-) 


AB 

L([y m }) 

ab 

(Sir-) 


BC 

i«W) 

be 

(y-ik-) 


AC 

L((y iok }) 

ac 

Wi-k-) 


ABC 


abc 

fiijk-) 



RN 

aben 

(liijks) 



(iii) Derive expressions for the entries in the corresponding analysis-of-variance table. 
This is a three-factor factorial design. 

*25. The location and scale subgroups are examined for the regression model 
(Section 7) by using full matrix notation and are examined for the progression 
model (Section 11) by using the location-scale symbol [■, •] of Problem 27, Chapter One. 
Check the details in Section 7 using the location-scale symbol, and the details in Section 
11 using the full matrix notation. 

*26. For the progression model derive expressions for the structural distributions, 
Y)d[L for (X and gg(7S: Y)d r G for 75 (Section 11). 

*27. ( Continuation ). Derive the marginal structural distributions for p. and 75 for the case 
of normal error (Section 12). 

*28. Derive the general Wishart distribution (Notes and References). 

29. Consider the progression group 



operating on points 



in Euclidean space R vn by matrix multiplication. 

(i) Check that G is a group and that the group is unitary on R vn provided n > p and 
certain trivial Doints are excluded. 
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(ii) In the pattern of Section 10 define a transformation variable and derive the invariant 
differential on R? n and the left and right invariant differentials on G. 

(iii) From properties of the invariant differentials deduce the value of the Jacobian 



30 ( Continuation ). Consider the simple progression model for n observations on p re- 
sponses : 

f{E)dE, 


where '6 is an element of the group G. 

(i) Derive the distribution of the error position variable [£] given orbit. 

(ii) Derive the structural distribution for "G. 

31 ( Continuation ). For the case of standard normal errors derive: 


(i) The distribution of error position [£] given orbit. 

(ii) The distribution of the error inner product matrix S(E ) = EE = [£][£] given 
orbit. 

(iii) The structural distribution for 'G. 

(iv) The structural distribution for S = "GTS'. 

*32. Regression progression model. Consider an error variable E\ 



with error distribution 

71 n 

f(E)dE — f(E) dE = IT /Oi;- • • • > e pi) TI ( de li ‘ * ' de vil- 

i=l 


Consider a quantity 
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or, equivalently (general location-scale notation in Problem 27, Chapter One), 

0 = [^B, -6]. 

And consider a response matrix Y: 



The regression-progression model is 



f(E)dE, f(E)dE, 

Y = dE, ° r Y=3!,V + ’ZE. 

(i) Check the equivalence of the two kinds of notation: full matrix; location-scale with 
matrix arguments. 

(ii) Consider the regression-progression group: 

I O’! B is p X r matrix 

B T J T is p x p positive-lower-triangular 

Check that G is a group. Describe the orbits on R pn using the L+ notation; show that G is 
unitary on R vn if n > r + p and a certain degenerate set of points is deleted. 

*33 ( Continuation ). Define a variable [Y]: 



and a point D{Y) in R vn : 
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Use the regression coefficients of Section 3 and the natural extension of the regression co- 
efficients and residual unit vectors in Section 10. Show that [7] is a transformation variable 
and D{Y) is a reference point; check the alternative notation: [B(Y),T(Y)\ and D(Y). 

*34 ( Continuation )). Verify the following invariant differentials : 


dmm ~Wr ■ 

j t \ d S dBdT. 

w = i7u == f WWa’ 


Mg) = 
AQr) = 


dg _dBdT 

i<?iv m v 
l£lv my 

\g\A m. r m A - 


*35 ( Continuation ). Derive the following distributions: 


g([E]:D) d[E) = k(D)f([E]D) |[£]|» 



/ r 



\ 

/ 

v Xi 


d li 

\ 

(' 

s. Vri J 

+ T 

_ d pi^ 

) 


I c 7i— (r+l) . . . -n— (r+j») dB dT 
| id) 5 (jj) aaai. 


g*(d: Y) dO = k(D)f(d~i 7) ^ dv(0) 


r ~\ 


k(D) IT/ I ■s -1 
1 = 1 


Vu 


- 31 


\ ff (D ■ ■ ■ 


W)---s}MY) d ® dis 


sl+\Y)---s^(Y) 


*36 ( Continuation ). Derive the location distribution gz,(H : D ) dH for the error variable! 
EL—T~ X B\ derive the structural distribution g^(3!> :Y) d3!> for 31. Derive the scale 
distribution g$(T:D)dT for the error variable T = T(E). Derive the structural 
distribution ggi'G : 7)efG for '5. 


*37. ( Continuation ). For the case of normal error 

f(E) dE = 0)- n3 >/ 2 exp {-i tr E E'} dE, 


determine the form of the distributions in Problem 35 and structural distributions in 
Problem 36. 


*38 ( Continuation ). Derive the standard Wishart distribution and the general Wishart 
distribution for the regression-progression model. Note the similarity in form to the distri- 
bution for the progression model case— -a change of degrees of freedom. 

39. Consider a matrix V of structural vectors and a matrix 7 of corresponding response 
vectors; let 



YY' = 




(i) Show that the regression coefficients of 7 on V are coefficients for the first 

response are in. the first row, . . . , coefficients for the last response are in the last row. 

(ii) Show that the matrix of residual vectors is 7 — iS 21 S 11 1 V. 

(iii) Show that the inner-product matrix of residuals is *-*22 — ^21^11 ^12- 

(iv) Consider the inner-product matrix for a general residual 7 — BV: 


(7 - BV)(Y - BV)'. 


Complete the quadratic form in B\ and thereby show that B — SoAi 1 gives a minimum 
inner-product matrix of residuals. 

40. (i) Determine W to satisfy the (r + />)-partitioned-matrix multiplication, 


and show that 


(ii) Show that 


" 1 

(f 


A 

c" 


p 

C 

w 

I 

J 


B 

D 

J 


1° 

D - BA- X C 


A C 
B D 


= \A\ | D - BA~ X C |. 




where K 1 is p x r, K 2 is r x p, and I in each case is an appropriate identity matrix. 


t Note. H is a p x r matrix, an analog of the vector t in the regression and progression 
models ; the capital T has been used already for the positive-lower-triangular scale matrix. 
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Conditional Analysis 


The structural model describes a certain kind of process or system: the form 
of the error distribution is known from theory or from experience with the 
same or related systems; the physical quantity is a transformation from a 
group, the transformation that carries an error value into a response value. 
The presence of the group is important; it permits the identification of 
characteristics of an error value; and with multiple observations the group 
allows the form of the error distribution to be determined; it allows the 
observer to see into the system. 

Not every physical quantity can be identified as a transformation from a 
group. This chapter examines a variety of models in which gcirt of the 
physical quantity is a transformation from a group. Chapters Seven and 
Eight examine models in which no part of the quantity is a transformation 
from a group. 

With a structural model a change in the quantity is a change in a trans- 
formation that applies to the error value; it produces a corresponding change 
in the response. Without this structuring relationship linking the quantity 
and the response, there remains only the frequency distribution that describes 
possible response values for each value for the quantity — the classical model of 
statistics. For the models in this chapter, the transformation part of the 
quantity can be analyzed in the framework Of~a structural model, and the 
remaining part can be analyzed by methods appropriate' to the classical 
mode!. 

I PROBABILITY AND LIKELIHOOD FUNCTIONS 

Consider a model with a response variable x taking possible values in a 
space X and with a quantity 0 taking possible values in a space Q. Suppose 
there is no structuring relationship by which a change in 0 can be related to a 
change in x; specifically, suppose there is the minimum for a statistical 
model— a frequency distribution f(x : 0) for the response variable x for each 
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value for 0, the classical model of statistics. Suppose also that the space X 
has a countable number of points with /(*: 0) as the probability function, or, 
alternatively, that the space X is an open subset of a Euclidean space with 
f(x : 0) as the probability density function relative to Euclidean volume. The 
statistical model then has a frequency distribution f(x: 0) for x in X with d in LI, 
and it has an observed valued x 0 for the response variable x. 

Consider first the case of a countable space X. Within the model an observed 
Xq has its labeling x Q and its probability of occurrence /(x o :0) as a function 
of the possible values 6 for the quantity. The labeling x 0 is irrelevant because 
of the assumed absence of any structure relating the quantity to the response. 
Within the model, then, an observed value has only the function f(x 0 : 0) of 6 
as its essential identification; two points x', x" with the same function 
/(x':0) — f (x^ : 0) are not distinguished. See Figure 1. The reduced statistical 
model then has a frequency distribution/(x:0) and it has a realized function 
f(x 0 : 0) giving probability of occurrence as a function of 0. Some inference 
methods, particularly for a large number of observations, are examined in 
Chapter Seven. A basic principle uses the function /(x 0 : 0) to assess the 
various possible values for 0, perhaps choosing as a single preferred value 
the value 0 that maximizes the function /(x o :0). 

Now consider the case of an open set X in Euclidean space. Within the 
model an observed x 0 has only its labeling x 0 and its probability of occurrence 
/(x 0 : 0) dx Q (in an element dx 0 that includes x 0 ). Again the labeling is irrelevant 
because of the assumed absence of any structure relating the quantity to the 
response. Within the model, then, an observed value x 0 has only the function 
f (x 0 ; 0) dx 0 (with dx 0 unspecified in magnitude) as its essential, identification. 
This can be expressed more compactly by defining the likelihood function 
from the observed x 0 : 

L(x 0 :6) = {/c/(x 0 : 0): 0 < k < co}. ' 



Figure 1 The probability function for an observed value x 0 . 

t In most analyses a single letter can be used to designate both a variable and a correspond- 
ing realized value, the distinction being made by the context. It is convement, however, in 
the present context to make the distinction explicit and use a subscr.pt u to dist.ngmsh an 
observed or realized value. 
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L(xqA) L*(x 0 :d) 



Figure 2 Representative elements kf(x 0 :6), k"f(x 0 :9), k'"f(x 0 :6), of the likelihood 
function L(x 0 ; 6). The likelihood ratio L*(x 0 : 9) relative to d Q . 


The likelihood function, in fact, is a set of functions of 0, all of the same form 
and differing only in the positive multiplicative constant k. This definition 
accommodates the unspecified element dx 0 . See Figure 2. A second, 
more formal expression for the likelihood function is 

L(x o :0) = i?+(x o )/(x o :0), 

where R+(x ) is the map that carries any point x in X to the set i?+ = (0, oo) 
of positive real numbers. Note that kR + (x 0 ) = R+ for any positive number k. 
If /(x o :0) 0 for each 0 then the unspecified constant can be avoided by 

using a likelihood ratio relative to some reference value 0 O : 


L* (x o :0) 


ffro-0 o ) ’ 


The likelihood function can be expressed alternatively as the log-likelihood 
function from the observed x 0 : 


/(x o :0) = {c + In f(x 0 : 0) : — co < c < oo} = i?(x 0 ) + In /(x o :0), 


where R(x) is the map that carries any . point x in X to the set R — oo, oo) 
of real numbers. The log-likelihood function, in fact, is a set of functions of 0 
all of the same form and differing only in the additive constant c. See Figure 3. 
The value — do must be allowed for In /(x o :0) to correspond to the value 0 
f°r/(x o :0). 

Consider further the case of an open set X in Euclidean space. Within the 
model an observed value x 0 has only the likelihood function L(x 0 :6 ) as its 
essential identification; the likelihood function gives the relative probability 
of occurrence of x 0 under various possible values for 0; two points x', x" with 
the same likelihood function are not distinguished. The reduced statistical 
model then has a frequency function /(x:0) and it has a realized likelihood 





Figure 3 Representative elements c + ln/(.v 0 : 0), c + ln/(:c o :0)> c + ln/(*o • 
c"" + In f(x 0 : 6) for the log-likelihood function l(x Q : 6). 

function L(x 0 :6) giving relative probability of occurrence as a function of 0. 
Some inference methods, particularly for a large number of observations, 
are examined in Chapter Eight. A basic principle uses the function L(x 0 : 6) to 
assess, one-with-respect-to-another, the various values for 6, perhaps 
choosing as a single preferred value the value 8 that maximizes the likelihood 
function L(x 0 :d).. 

2 A CONDITIONAL MODEL AND MARGINAL LIKELIHOOD 

Consider now a model that is partly structural and partly classical. Let E 
be an error variable on an open set X in a Euclidean space R . Let 6 be a 
primary quantity, an element of a unitary group G of transformations of 
X onto X (with Assumption 3 in Chapter Two). Let X be an observed 
response that is produced by the transformation 6 applied to a realized error 
value E. And suppose that the error distribution 

f(E\X) dE 

is known except for an additional quantity A. This gives the 

Conditional Structural Model 

; f(E:X)dE, 

X= OE, 

with additional quantity l. The model has an error variable E with distribution 
dependent on the quantity A; and it has a structural equation in which a 
realized error value E is transformed by the quantity 6 to give the response X. 
If the additional quantity A is known in value, then the conditional structural 
model is an ordinary structural model. 


§2 
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Now suppose there is no outside information concerning 6 and A. For an 
assumed value for the quantity A the structural model produces a reduced 
structural model 

[X] = 9[E], 

where 

gxm : D) d[E\ = k x (D)f([E]D : X)J N (E) dp,([E]). 

The corresponding structural distribution for 8 conditional on A is 
gt(8:X)dd = k x {D)f{0~ 1 X : X)J N (d~ 1 X) A ([A]) dv(8). 

For an assumed value for A this distribution is the basis for inference 
concerning 6. 

Now consider inference concerning A. The structural equation gives no 
information concerning the error position [E], but it gives the value of the 
error orbit GE — GX. The distribution that describes the origin of the error 
position [E] is a distribution that involves A; it can give no information 
concerning A without the realized position [£]. This leaves only the known 
orbit GE = GX and the distribution that describes its origin. The distribution 
that describes the origin of the orbit GE = GX involves the quantity A; it is a 
classical model. The likelihood function for the known orbit based on this 
classical model is now derived and is called the marginal likelihood function 
for A. 

The probability element for E based on Euclidean volume is 

/(£:A) dE. 

The conditional probability element for [£] given the orbit D(E) — D is 

J \r(E) 

kx (D)f([E]D:A)-^-Ed[E]. 

The marginal probability element for the orbit D = D(E ) can be obtained 
by dividing the full element by the conditional element 

1 dE - ■ 

kx(D) J n (E ) d[E] ' 

The marginal element based on differentials at the point X rather than at E 
on the orbit D is 

1 J L ([X}) dX 

kx(D) J.\’(X) d[X] ' 

The likelihood function based on D is 

L( D : A) = R + (D )~^ — ; 

k x (D) 
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the omitted factors do not involve X and have been incorporated into < Rc(D). 
This is the marginal likelihood function for X. This marginal likelihood function 
is the basis for inference concerning the quantity X. 

3 THE MEASUREMENT MODEL WITH AN ERROR QUANTITY 

Consider the measurement model of Chapter One and suppose that the 
error distribution 

f(e:p)de 

involves a shape or form quantity p. As examples consider 
ffe :p) = k exp {-* |e|*}, P > °> 

f 2 (e:p) = k(l + exp {-^X^exp |J- 
For n measurements the model is 

flf(e i :P)f[de i , 

i i 

x — [p, or]e. 


This is a conditional structural model with additional quantity p. 

For an assumed value for the quantity p the reduced structural model is 

n , „ de ds e 

k f (A) n/(® + s/ig.pK — y* , 

[x, s x ] = [p, o][e,s e ). 

The structural distribution for [p, a] conditional on p (Section 18, Chapter 
One) is 

')(?)' r? 

The probability element for e based on Euclidean volume is 


i i 

The conditional probability element for [e, s a ] given the orbit d is 


Wfl/O' + sA-.p^deds.-, 

1 s e 

it is expressed in terms of Euclidean area on the positive affine group. The 
vector [e, s e ] d is a point on the orbit through d in Euclidean space R n ; let 
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Figure 4. Euclidean measure on the orbit : V ndeV n — 1 ds e . Euclidean volume measure 
cross-sectional to the orbit: HdejV n 2 — n de ds e . 

d[e, sjd designate Euclidean area on this orbit. The ratio of Euclidean area 
on the orbit to Euclidean area on the group is 

«([#.s.l:d)= 

d[e, sj 

see the example at the end of Section 3, Chapter Two; also Figure 15, 
Chapter One. The conditional probability element for [e, s e ] can then be 
written 

k,(A) fl/(« + , Jf d[e, sjd 

1 s;Vn 2 - n 

as based on Euclidean area on the orbit; see Figure 4. 

The marginal probability element for the orbit d = d(e) is obtained by 
division : 

1 Vn 2 — n XI de,- _ 

W *r 2 ’ d[e, s a ]d ’ . 

it is expressed in terms of (n — 2)-dimensional Euclidean volume cross- 
sectional to the orbit. The marginal probability element for the orbit as 
based on (n — 2)-dimensional Euclidean volume dv at the observation 
vector x is 

1 Vn 2 — n , 

— — — • dv. 

k P ( d> s r 2 

The marginal likelihood function for P is 

k p { d) 
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The structural distribution tor [u, ej piovidc.s the basi.. M ''~‘ 

concerning [,u, a] for an assumed value for h: the marginal akeunocd 
function provides the basis for inference concerning p. 


4 a COMPOSITE RESPONSE MODEL 

Consider a sequence of p response variables ?/,, . • • , Vv anci oppose that 
the distribution 

ffa, . . . ,e p :fl) 

of the related error sequence e 1: . . . , e v has been identified, except foi a 
quantity B describing form, shape, linking, or other characteristic or combina- 
tion of characteristics. Let be the general level ot the /th response and p s 
be the error scaling for the yth response. For n observations on the composite 

response the following model is obtained: 

y l = [pi» °i] e x 


; y „ — [/*„, g i\ & v 

The model is a conditional structural model (n > 2) with additional quantity 
/? * An example of a bivariate error- distribution with additional quantity p is 

1 f — 2 pe x e, + el) 

1 iT^Ti f 

the bivariate normal error distribution with correlation p. 

The quantity | ay] belongs to the positive affine group 


[a } , Cj ] : 


- 03 < Oj < CO ) 


0 < A < CO 


on the Euclidean space R u of the yth response vector, i he composite quantity 

([/A, Al. - • ■ > [,«;»> ff *\) bel° n S s t0 a § r0L1 P 

G - {(fo, A], . . . , [a P , c v \): [a ]t c/j e G,}, 
a direct product of the groups G x , ..... C,’ (uccually the pth power of the 
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positive affine group since G u . . . , G v are equivalent); the multiplication is 
coordinate-by-coordinate: 

([«!. - - • > \a v , c p ]) ■ ({A 1} Q], . . . , [A v , CJ) 

= ([ fl ia c i]Mij Gfl, . . . , [a % p , c p ][A t , C p ]). 

The invariant differentials and modular function can be obtained by combin- 
ing those for the component groups. 

For an assumed value for the quantity /? the reduced structural model is 
kp(Au • • • » d p ) JI /(e x + s ei d u> . . . , e v + s e d vi : ft) 

•(* s,xu^, 

s h 

Wi,s Vi ] = [.Pi, Vi\[ei,s Ci i 


C y,n Tj = <*,][*„, T,j. 

The structural distribution for the composite quantity conditional on ft is 

k p ( d l5 . . . , d„) I Jf ( y - yi ~ /Xl , . . . ,‘h±ZLhi : p\ . • • • O” x | t a tiAli 

t (a • • • O" 


The marginal probability element for the orbit 

(d l5 . . . , dp) = (d^A), . . . , dp(Cp)) 

can be obtained by the methods in the preceding section applied to each 
response vector: 

1 (n % — n) rl ~ 

. — : — — • dv; 

k fl (Z i, dp) (s n . . . a/ 1 - 

the element do measures (n — 2)/i-dimensionai Euclidean volume cross- 
sectional to the orbit at the observed composite response. The marginal 
likelihood function for ft is 


• - • , ,‘fj) — R‘(ct h 


~" P/ kp(d v . . . , dp) 


For an example consider the bivariate normal error distribution with 
correlation p. The normalizing constant for the conditional error distribution 
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can be obtained by integration: 
kj\d u d 2 ) 

1 

“ (2tt)"( 1 - p 2 ) n/2 

f ( (e, + s„d u f — 2 p(e x + s ei d u ) ■ ( e % + s e J 2i ) + (<? 2 + s ei d 2z ) 2 

•J^r 1 20 ^ 

• (s ei s e2 ) n ~ z di i de z ds ei ds ez 



r 1 ( ef — 2 pijio + e\ 

= J(2,)(i-^ exp r” 2 (1 -, a ) 

_^-L__Jexp {-(»-»: ! 

(s e s e y~ z ds ei ds ei 


exp — (n — 1) 


n e x djn e 2 


s 2 ei - 2 prs ei s ei + s~ H 
2(1 -P 2 ) 


it ■ 

2 n-1 (l - p 2 )<— 1)/2 
~ (27r) m - 1 n(n - 1) B_1 , 

2—3(1 _ p2)<— D/2 


” (2 7 r) n - 1 n(/i - I) 71-1 « 

, 

■ ii : 

! 'i 

2—3(1 _ p yn-l)/2 

ili 

~ (2 Tty-'nin - l)”" 1 


2«-3(^ p 2 yn— 3)/2 

■ 

“ (2 7 r) n - 1 n(n - l)"- 1 

j: 

2—3(1 _ p 2 )<— 1> /2 

\ ' 

” (27r)”- 1 n(n - I)”" 1 


exp { — 1\ — t\ + 2prt l t 2 }(t 1 t 2 ) n ' dt 2 


(2 pr) a p2 ( n - 1 + « ' 


H n _i(pr), 


a=o a ! \ 2 / 


The preceding simplification involves a number of steps. In the first step 
the error density function is substituted. In the second step the terms in the 
exponential are expanded and rearranged, 

2 d H = 0, 2 4 .= n - 1, 2 du d 2i — {n — l)r. 



§4 A Composite Response Model 

and expressed in terms of the sample correlation coefficient r, 
(n - l)- 1 2 (2/ii - VdiVzi - V*) 


(n - 1) 1 2 ( e u ~ <A)(e 2 i ~ ^ 1 


2 d u 4 


between y x and y 2 , between and e 2 , or between d x and d 2 ; the first part 
of the expression, an integral, has value 1. In the third step the substitution 

Jn - 1 s„ 

' V2Vl -p‘ 

is made. In the remaining steps the cross-term in the exponential is expanded 
in a series and is integrated term by term. 

The conditional error distribution given the orbit then has the form 

hi d t , d,) 

(27T) n (l - p 2 ) n/2 

( _ y s ei d n) 2 ~ 2 p(e 1 + s ei d u ) • (g- 2 + s e „d 2i ) + (e 2 + s e „d 2i ) 2 \ 

' CXP \ Z 2(1 -P) t 

• (s ei s J” -2 de 2 ds Ci ds c . 


2vr(l - p 2 ) 


exp ( „ n gl ^ ^pe^ 2 + e 2 | d ^ n ^ n - 2 


(» - 1) T 


2-3(1 _ p 2 ) n_1 // n _ 1 (pr) 

(y^)"- 2 ^ ds e2 . 


2(1 - p 2 ) j 
exp { — (« - 1) 


s 2 t - 2prs ei s ei + ^ 

2(1 - p 2 ) 


And the marginal probability element for the orbit (d l5 d 2 ) at the observed 
response is 

1 . n 2 ~n ^ = 2"- 3 (l - p 2 ) ( ”- 1)/2 j/ n _ 1 (pr) ^ 

hi^, d 2 ) (s yi s v y-* (2v) n ~\n - iy-Xs n s u y- 2 

The marginal likelihood for p is 

Md,, d s :p) = JJ+(d„ d 2 )(l - 

This can be expressed as a ratio relative to p = 0: 

, (1 - (1 - p 2 )'— w H„_,(pr) 

L*(d l3 d 2 .p) = — — = - ■ 2 ( . 

H n - 1(0) 1 ((« “ I)/ 2 ) 
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The marginal likelihood function L*(di, <>, p) depends on the orbit 
(d u A,), but it is seen to depend on the orbit only in terms of the cor relation 
coefficient r. The probability element for the orbit (d„ d,_) for general p can 
be related to the element for p = 0 : 

/i(d„ d 2 :p)d(d„ d.) = d 2 ). 

The distribution for (d„ d.) has its dependence on p isolated in the factor L* 
and this factor depends on the orbit (d„ if) only in terms of the correla 1 
An integration to obtain the marginal distribution of r = (n - 1) (d„ d, 
is an integration over variation in d„ 4, for fired r. In this integration / 
is a constant factor; hence, if 

h(r.O) dr 


is the marginal distribution for r with p - 0, then 

(1 _ P T- 1),2 H n -i(pr) 

h(r:p )* = — 2((ii _i )/2) - 


h(r:0) dr 


is the marginal distribution of r for general correlation P . The general distri- 
bution is obtained by likelihood modulation of the special distribution. 

For p = 0 the distribution of r has element 


F((n - D/2) _ ,. r -«« dr = r ,r T"- Wj) (1 - r *)'-‘ ,,, dr. 

f(«r((n - 2 )/h " r (" - 2 > 

— 1 < r < 1; 


note that T(2p) = 2^V(p)V(j, + il/TO). This distribution can be estab- 
lished by using the normal regression theory in Chapter Three to show that 


V” - 2 r 


has a r-distribution on n - 2 degrees of freedom conditionally given' e„ 
hence marginally. 

The general distribution for the correlation coefficient r is then 


— (1 - ; p 2 ) ( " 1)/2 H,„_ 1 (pr)(l 

TriXn - 2) 



— 1 < r < 1. 


5 THE MEASUREMENT MODEL ON THE CIRCLE 


Consider a surveyor measuring a direction in the horizontal plane, or a 
physicist measuring a directional property on a plane surface, or an oceanog- 
rapher measuring the direction of a wave tram, these are applications or 
the measurement model on the circle. 



Let the vector (1, 0) be a reference direction on the plane jR 2 , and let a 
point e = (e l5 e 2 )' on the unit chcle give the error angle e measured positively 
from the reference direction (see Figure 5). Suppose the error distribution has 
been identified except perhaps for an additional quantity k : 

/(e: k) de — f(e x , <? 2 :/c) de. 

The vector e = (e 1} e 2 )' is restricted to the unit circle, and the differential 
de = de measures length on the unit circle. 

The physical quantity is the general direction of the property investigated. 
Let 6 be this direction as designated by the rotation 



through an angle a from the reference direction. The quantity 6 belongs to 
the positive orthogonal group of rotations on the plane: 

( cos a —sin a | 

: 0 < a < 2-7T >. 

sin a cos a J 

For a single observation let x = (x 1 , x 2 )' be the measured direction. The 
model is 


f(e i, s 2 :k) de, 
x = de. 
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For multiple observations let 

A ==! (xj, . . . , x„) = 
designate n measured directions and let 
E = (e 1} ■ • • , c n ) 


designate the corresponding realized error or the corresponding error 
variable. This gives the 

Measurement Model on the Circle 

n/(«r.*)TI *1. 

X= BE. 

The model has an error distribution describing the multiple measurement 
process, and it has a structural equation in which a realized error E has 
determined the relation between the measurement X and the quantity B. 
The model is a conditional structural model with additional quantity k. 
Consider the effect of a transformation g: 

gX — g(pt x, . • . , x„). 

The transformation g takes the n points x lf . . . , x w on the unit circle and 
rotates them through an angle a: The relative position of the points remains 
the same ; the general placement on the circle is changed by a rotation through 
the angle a. To describe the position of the n points let 


a(T) = 


cos a{X ) 
sin a(X ) 


be the unit vector in the direction of the sum vector 


Let /(A) be the length of the sum vector. Then 

i\x) = (2 *it) 2 + (I ^)\ 

f Ifu ] 

_ ? Xi _ KX) 



§5 


The Measurement Model on the Circle 


199 



Figure 6 The array X; the location vector a(T); the reference array D. 

See Figure 6. A transformation variable can be constructed as the rotation 
through the angle a(X ) : 


— (32(A) 
a 2 (X) a x {X) 


cos a{X) —sin a(X ) 
sin a{X) cos a{X ) 


The corresponding reference point is 


D(X) = (d x (A), . . . , d„(A)) = [XT'X = 


<hPO a 2 (X) 

-a 2 (X) ai (X) 


a 1 (X)x 11 + a % {X)x n ■ • • a 1 (X)x ln + a 2 (X)x 2n 
a z(X)x 11 + ct 1 (X)x 21 • • • —a 2 (X)x ln 4- a 1 (X)x 2n 

Note that the sum vector for D{X) is in the reference direction and has 
length /(Z)(A)) — l(X): ' - • 

2 d t .(A) = [A]- 1 2 x, = [A] _1 /(A)a(A) = /(A) 




The invariant differentials are the Euclidean differentials for the unit vectors 
involved (i.e., lengths on unit circles). 

The conditional distribution of the error position a = a (E) given the 
orbit D is 

g K (a..D) da = k K (D ) Wf( a i c hi a 2^2i> + a \d 2 i : k) da. 




Measurement iYiodei on the Circle 


The normalizing constant involves the imaginary Bessel function of zero 
ircier, 

hi K ) — ! exo {k cos e\ cie. 

2 rr jo 

The characteristic K describes precision: With k = 0 the distribution is 
uniform on the circle; with a large positive k the distribution is concentrated 
on the circle near (1, 0): 

/Oi, e % \t<) = k ! exp {k( 1 — -Je 2 + • • •)} 

Ri k" exp {— Jfce 2 }. 

The noimai error distribution for the circle can be obtained from a symmetric 
normal distribution in the plane by conditioning to the unit circle (relative 
to the partition by circles about the origin). 

The conditional distribution of the error position a = a (E) given the 
orbit D is 


r«(a :D) da — k K ( D) 


(2ir/ 0 (fe))» 


IJC 

CXP '.X K ( a id-1 i #2^2 i) r / Cl 


= kid) (mZ« { lKa ' ] da 

= ~- yy. \ e ^P {lx cos a} da. 

2n tI 0 (Ik) 

The conditional distribution for the error position is also a normal error 
distribution for the circle but with precision Ik = 1{X)k = 1(D)k ; the 
distribution depends on the orbital variable D — D{X) but only in terms of 
the real variable l = l(X). For an assumed value for the quantity k the 
reduced structural model is 


2rfi(/(A» 6XP {Klf>K C ° S a l da, : 

ct(X ) = o. a. 

The structural distribution for the angle a conditional on the quantity k is 

CKX),c 0s 

The marginal likelihood function is 

ud-.k) = r + (d) = r + (d ) • 

1< ~K iX i M'Ti c {K.' j” -o(k) ^****“f 
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the marginal likelihood can be expressed as a ratio relative to k 0 . 

t , n , IoOcp)*) w) _ him*) 

n(K) j o(0 ) n( K ) 


It is of interest that the marginal likelihood function depends on the 
orbit only in terms of the real variable 

l = 1(D). 

It follows that the marginal distribution for the orbit D involves only / in its 
dependence on k. And it follows as in the preceding section that, if 

h(l:0) dl 

is the marginal distribution of the length l with k = 0, then 


h(l :k) dl 


h(l:0) dl 


is the marginal distribution of the length / for general k. 

The distribution of the length / of the sum vector based on the uniform 
distribution on the circle ( k = 0) is available from probability theory. 


h(l:0)dl 


- 'J> 


(Iu)Jq(u)u du ■ dl, 0 < / < co, 


where J 0 (u) is the Bessel function of zero order. The general distribution for 
/ is then 

h(kx) dl = ^ l rJ 0 (ht)J» du -dl, 0 <l< co. 

I o( K ) 

*6 MARGINAL LIKELIHOOD: EXTENSIONS 

The conditional structural model can be exteqded in two directions. In 
some contexts it may occur that the response variable X has been inappro- 
priately expressed and that a transformation of X, 

X x = l(X:X), 

dependent on some aspect of the additional quantity A is in reality the 
natural response variable: A value X x of the natural response is produced by 
the transformation 6 applied to a realized error value E. For each A suppose 
that X x = l(X: A) is a one-to-one continuously differentiable function from 
the range of X onto the space X. 

As a second extension suppose that the group G applies to the space 
in a way that depends on A. Let 6 as a transformation on X be designated d x , 
and let G as a group of transformations on X be designated G x . 
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¥ 

These two extensions give a generalized 
Conditional Structural Model 

f(E:X) dE, 

X x = 0 X E, 

with additional quantity^ A. The model has an error variable £ with distribution 
dependent on A; and it has a structural equation in which a realized error 
value E is transformed by the quantity Q x in G x to give the natural response 
X x ; the natural response! X k is related to the observed response X by the 
equation X x = l(X:X). If the additional quantity A is known in value, then 
the conditional structural model is an ordinary structural model. 

For analysis, let G x X x be the orbit of X x under the transformation group G x , 

G x X x = {gx X x- gx eG x)- 

Let [XJ A be a transformation variable relative to G x , and let D x (X x ) be the 
corresponding reference point: 

x x = MATO. 

See Figure 8. 

For an assumed value for the quantity A the structural model produces a 




Figure 8 The orbit G x X x passes through the natural response X x . The inverse image of 
the orbit G x X x relative to the map X x — l(X : A) passes through the observed response X. 

t The error distribution, the transformation group, and the form of the response depend 
on the quantity A. The quantity A may, indeed, have separate coordinates, one for each 
effect. 
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reduced structural model 

= ejn. . 

where 

g x ([E)x- Dx) d[E] x = k x (D x )f([E] x D x :X)J N ([E] x : D x ) dy([E] x ). 

The corresponding structural distribution for 6 conditional on A is 

g*(0: X) dd = k x (D x )f(d x 1 X x :X)J N (dJ 1 [X x ] x : D x ) A([X a ] a ) dv(d). 

For an assumed value for A this distribution is the basis for inference 
concerning d. 

The probability element for £ based on Euclidean volume is : 

/(£ : A) dE. 

The conditional probability element for [£] A given the orbit D x is 



k x (D x )f([E] x D x :X) 


J n ([E] x :D a ) 


* V A\ A/J \L JA A / r /ri?1 \ 

j L m>) 

The marginal probability element for the orbit D x can then be obtained by 
dividing the full element by the conditional element 

1 Jl([E]x) dE 
k x (D x )J N ([E] x :D x )d[E] x - 

The marginal element at the point X x on the orbit D x rather |han at the point 
£ on the orbit £ A is 

1 JjftXA x) dX x 
k x (D x )J y ([X x \ x .D x )d[X x ] x ‘ ' 

The differential dX x can be expressed in terms of differential Euclidean 
volume for the observable variable : 

|ai(Jfn)Lv 


The differential d[ XJ, on the group can be expressed in terms of differential 
Euclidean volume in the L dimensions along the orbit: 


K([X x ] x :D x ) 


d[X;] x D x , 


K([E] X :D A ) 


B[E] x D x 

d[E)x 
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The differential /f[£ A ] A £> A along the orbit D x can then be expressed in terms 
of differential Euclidean volume for X along the inverse image of the orbit D x : 


d[X x ] x D x = 


dl~\[X x ] x D x ) 

d[X x ] x D x 


dl-\[X x ] x D x ) 


(Figure 8). Section 7 presents an example in which a combined effect 
of the preceding two differential adjustments can be calculated in a single step. 

The marginal probability element for the orbit D x as based on cross- 
sectional Euclidean volume dv at the observed response value X is then 


dl(X:A) 

1 ax 

kx(D x ) £ v ([A' a ] a : D x ) dl~\[X x ] x D x ) - 1 ‘ 

d[X x ] x D x 

The marginal likelihood for A from the orbit D as observed at X is 


dl(X : A) 

E + (DJ Jr.([X x h) K([X x ] x : D x ) dX 

k x (D x ) J n ([X x ] x :D x ) dl-'([X x ] x D x ) - 1 ‘ 

3[X x ] x D x 

The marginal likelihood for A is the basis for inference concerning the 
quantity A. 


7 THE TRANSFORMED REGRESSION MODEL 

Consider again the regression model of Chapter Three. In some potential 
applications, familiarity with related systems may indicate a regression 
model with structural vectors v 1( . . . , v r and error distribution f(e) de , but 
the familiarity may leave doubt about the appropriate manner of expression 
for the response variable; for example, a response variable may be expressed, 
on first approach, as a variable y, yet detailed investigation may show that 
some transformed variable such as In y,-y l/i , or y - 1 may be the appropriate 
variable for the regression model. A transformation having parameter A 
that includes these three transformed variables is 

V U) = y\ A 5* 0, 

= In y A = 0 

(the transformation applies to a positive variable y). 

Consider the response variable expressed as y, and suppose that familiarity 
with related systems indicates that the regression model can be applied to the 
transformed variable 



y U) = i(y, A) 
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for some value of the additional quantity X. Suppose that l(y, X) is a one-to-one 
continuously differentiable function and let 


The transformed regression model with additional quantity X can then be 
expressed as 

f{E) dE, 

Y x = dE. 

Suppose that n > r + 1 and that v ls . . . , v T are linearly independent. For 
given X the model is a structural model. The model then is a conditional 
structural model with additional quantity X. 

The conditional distribution of the error position [E] given X and the 
orbit D(E) = D(Y X ) = D x is 

g([E] : DO d[E] = fe( £>,)/«£] D J f i[£] 




and for the case of normal error is 

\VV'l' A , A„_ r f A j 

^Lexp{-ibFFb}db “P( 2 ) rfs - 

For an assumed value for X the reduced structural model is 
g([E]:D(Y x ))d[E];-,. 

[FJ = d[E]. 

For an assumed value for X the structural distribution for d is 

k(Di) fl f (yjf- I£M M> ■ 


and for the case of normal error is 




s-fyn- 


exp {-KP - Ky U) ))' (P - b (y U ’))} W 


s 2 (y u) )l s(y U) ) da 


£ .. 
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f n ° r / n aSSUmed value for A this provides the basis for inference concern- 
The marginal element for D x based on differentials at the point E is 

/( £ ) dE = 1 ^ r+1 (e) dE 

g([E] : D x ) d[E ] k{D x ) ^ ’ 

and based on differentials at the point Y x is 

__j s r+1 (y U) ) dYj_ 

k (D x ) s”(y (A) ) d[Y x f 

The differential dY x can be expressed in terms of the differential dY : 

.... dy'») = .... dj, n )J( y:A) , 


is the Jacobian matrix of the transformation, hence 
d Yx = U(y:A)| JF. 

The differential d[Y x ] can be expressed in terms of differential volume at 
r along the inverse image of the orbit D„ For this the differential vector on 

natural response & rC ated to the corres ponding differential vector for the 

(*/" dy“>) = (db x , .... db„ ds)D x ; 

“i" d ^ eren ‘ ia i VeCt " for the na tural response can be related to the 
corresponding differential vector at the observed response: 

(dyx, ..., dy n ) = (dy[ x \ . . . , dy^)J~\ y:A). 

The composite transformation is 

(dy 1 ,..., dy n ) = ( db x , ... , db r , ds)D x J~ 1 (y:X). 
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Hencef 

dr\[Y x }D x ) = \D ; r\y:X)D' x \ ,A d[Y x ], 
d[Y x ] = j D x J-~{y'X)Dx\~ A dl L ([Y } ]D ? ). 

The marginal element for D x can now be expressed in terms of cross- 
sectional Euclidean volume dv at the observed response Y. 

l ‘ l I j(y A)1 ' dv . 

k(D x ) s «- r - 1 (y (A) ) | D*T*(y ; A)Dir 1/2 

and for the case of normal error is 

i 1 — du . 

A. n _ r | D x D’ x \ 1A s n - r -\y U) ) | D x r\ y : A) D' x \ 1 • 

(note that \ VV'\ = \D X D X \). The marginal likelihood function for D x xs 


R + (D;.) 


\J(T- A) | 


fc( D;.)s 7l - r “ 1 (y (; - ) ) I D x j 2 (y : A) D Al- 


and for normal error is 

R + (D X ) ld(y: A)1 _ 

s «-r-i( y (xi) | D; j - 2 (y : A)D;.r' / - • 

The marginal likelihood function is the basis for inference concerning the 
quantity A. 

NOTES AND REFERENCES 

The likelihood function was promoted and developed in statistics by 
R. A. Fisher (1922, 1925, 1934, 1956). The concept received rathei little 
attention from North American statisticans , notable exceptions are Birnbaum 
(1962) and some related papers. Another concept promoted by Fisher 
(1922, 1925, 1956), the concept of a sufficient statistic, received widesprea 
attention, however. A close relationship exists between the two concepts 
this ■ was indicated informally by remarks in Fisher (1934) and was given 
explicit recognition by. G. Barnard and J. W. Tukey at statistical meetings 
around 1960. The close relationship produces new and short methods o 

i Let y = h(u) be a continuously-differentiable transformation from RL into .A At u 
let J = (9h '/du) be the L x N Jacobian matrix; then dy -di\j. Let J T 
triangular-orthogonal factorization of J (Section 6, Chapter Three); thenr/y - da TO. At 
y the differential changes are in the L-dimensional subspace spanned by the rows oi V 
these rows provide new axes for the subspace. In terms of tne nev^axes the trans orn 
is from clu to da T. The Jacobian determinant is |T| = \TT /- - \JJ V-- ^ 
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analysis for sufficiency in the classical model (Fraser, 1966a, c). The notation 
ilsed tor the likelihood function follows closely a notation proposed by 
j. Bondar. 

The concept of marginal likelihood was introduced in a preliminary form 
as residual likelihood (Fraser, 1964). For problems involving transformations 
of a response variable to obtain a linear model, Box and Cox (1964; applied 
likelihood methods and a modified Bayesian method — the Bayesian method 
using the theoretically desperate device of choosing a prior distribution 
on the basis of the observed response. The concept of marginal likelihood 
was introduced in an explicit form in Fraser (1967) and applied there to the 
regression model as in Section 7. The resulting marginal likelihood function 
avoids the approximations needed by Box and Cox and provides greater 
sensitivity to the data than can be obtained with the likelihood or modified 
Bayesian results. The use of the marginal likelihood methods in Section 3 
avoids in a similar manner the need for the Bayesian methods in Box and 
Tiao (1962). A multivariate version of the Box and Cox problem has been 
analyzed by Fraser and L. M. Steinberg. 

The distribution of the correlation coefficient was derived by Fisher (1915). 
The analysis of the composite response model in Section 4 produces, as a 
byproduct, a simple derivation of this distribution; it avoids the intermediate 
derivation of the covariance-matrix distribution. The method for deriving a 
general distribution by the likelihood-modulation of a special distribution was 
introduced by Watson (1956) and Watson and Williams (1956) in an analysis 
of distributions on a sphere (to be examined in Section 2, Chapter Five). 
A survey of expressions for the correlation-coefficient distribution is given by 
Hotelling (1953). 

The normal distribution on the circle was proposed by Gumbel, Green- 
wood, and Durand (1953). The distribution had been used in other contexts 
and some related distribution theory solved (Kluyver, 1906; von Mises, 
1918; Rayleigh, 1919). A survey of results concerning distributions on the 
circle is given by Stephens (1962). 
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PROBLEMS 


1. Consider the simple measurement model with additional shape quantity (k 


i—l *= 1 

x = [6, l]e. 


(i) Derive an expression for the structural distribution for 9 given /! 

(ii) Derive an expression ,for the marginal likelihood function for /?. 

(iii) For the case of normal error with scaling p — o. 


obtain the marginal likelihood explicitly. Show that the value <r = J°< 

quantity maximizes the marginal likelihood function. • • s 

(iv) (Continuation). Assume that has the distribution of *(« - ) A °* n ~\}*® 4 
of freedom for a = 1. Use the likelihood-modulation method at the end of Section 
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and 5 to show that the general distribution of s x is that of a%(n — 1)~M on n — 1 degrees 
of freedom. 

2. Consider the simple measurement model (Problem 1) with error distribution 

f(e\a) de = ^ exp — exp - j de 

and additional quantity a (Problem 16 in Chapter One examined the measurement model 
based on a standardized form of the preceding error distribution.) (J. Whitney.) 

(i) Derive the structural distribution for 6. 

(ii) Derive the marginal likelihood function for a. Determine the equation for obtaining 
the value of o that maximizes the marginal likelihood function for a. 

(iii) Derive the classical model fix: 6, a) for the response variable x. Obtain equations 
for the (0, a ) that maximizes the likelihood function based on the classical model. 

(iv) Compare the equation for the appropriate a value in (ii) and the equation for the 
inappropriate a value in (iii). 

3. Consider the multiplicative measurement model (Problem 19, Chapter One) with 
additional quantity <5: 

i=l i = 1 

x = [0, 0]e. (H. Levenbach). 

(i) Derive an expression for the structural distribution for 6 given 6. 

(ii) Derive an expression for the marginal likelihood function for (5. 

(iii) For the case of normal error with coefficient of variation 6 

f(e:5) = — L= exp {-He - <5) 2 } 

V2i T 

obtain the marginal likelihood function explicitly. Find the value for <5 that maximizes the 
marginal likelihood function. Use s(x) — let /(x) = Vjj — xfifA = 

Vndfil — n d 2 )'A' 

*(iv) Use the method of likelihood modulation (Sections 4, 5, and Notes and References) 
to obtain the general distribution of the essential variable t(x) based on the orbit. 

4. Consider a composite measurement model with additional quantity (1: 

f(E) rf£ = n f(e u , ...,e vi :P)fl (de u - ■ ■ de pi ), . 
l l 

X = BE, 

where 



* ** . . s>-' v 

) '• , , ,. V • 
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and d is an element of the location-scale group 


-00 < ay < 00 


0 < c < 00 


(i) Check that the transformations form a group. 

(ii) In the pattern of Section 4 in Chapter Three and Problems 1, 2 in Chapter Three 
define a transformation variable and determine the reference point. Show that the model is 
a conditional structural model with additional quantity (3. 

(iii) Derive the invariant differentials- and the modular function. 

(iv) Derive the distribution for error position; derive the conditional structural distribu- 
tion for 6 given jS. 

(v) Derive the marginal likelihood function for [3. 

5 ( Continuation ). Consider the preceding composite measurement model and suppose the 
error distribution is standard normal: 


M 

(no additional quantity). 

(i) Derive the distribution for error position [£]; use 


as a convenient scale variable. 

(ii) Derive the structural distribution for (/t x , . . . , <?)• 

6. For the composite response model with normal error ( p — 2) .in Section 4 derive the 
equation that must be solved to obtain the value of p that maximizes the marginal likelihood 
function. 

7. For the normal distribution ^ ■ 


- [(<?! - P) 2 + e h\ de i de 2 


in the plane determine the conditional distribution given that ej + ej = I ; relate the k o 
the normal distribution on the circle to the (,u, a) of the preceding distribution in the plane 
(further grounds for the name “normal distribution on the circle ). 

8. Consider a simple composite-measurement model (known error scaling) with additiona. 
quantity (j : 

/(£) d£ = fl f(e u , e vi :P) ]Q (de u , .... de vi ). 


BE, 
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where 



and B is an element of the location group 



(i) Check that the transformations form a group. . . . , , . rm - 

(ii In the location pattern of Problem 4 define a transformation variable and de « 
the reference point. Show that the model is a conditional structural model with additional 

qU (hi) Derive the invariant differentials and the modular function. structural 

(iv) Derive the distribution for the error position; derive e 

distribution for d given 0. 

(v) Derive the marginal likelihood function for p. 

9. ( Continuation ). Consider the preceding measurement model and suppose the component 
error distribution is an uncorrelated normal: 



1 

(2tt) p/2 U 1 • • • S 


exp 



(i) Derive the distribution of the error position [£]. Derive the structural distribution for 

Oil, given (<r x , • • • , «*»)• . . , „ 

(ii) Derive the marginal likelihood function tor (£*!, .... <V- 

10. Consider the simple measurement model with additional quantity A, 


n/w TT ^ e i> 

t=l *=1 

lixp.X) = [0, IK 



l( x n'-^ = f 0 ’ 

where l{x : A) is a continuously differentiable monotone function mapping the range of 
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„„,0 S' (« is the available response, /(*:« is the natural response as based on the correct 
value for A; see Section 6 or 7.) 

(i) Derive an expression for the structural distribution for 6 given A. 

(ii) Derive an expression for the marginal likelihood function f . 

(iii) For the case of normal error, 


obtain the marginal likelihood explicitly. Describe the value of A that maximizes the 
marginal likelihood. 

11. Consider the measurement model with additional quantity At 

ii/wfl*.. 

1 1 

/(aijiA) = 


l(x n :X) = [/(, o]e n , 

where «.:») is a continuously differentiable monotone function mapping the range of * 
onto JR 1 (see Section 6 or 7). 

(0 Derive an expression for the structural distribution for [ft «] given A. 

(| Derive an expression for the marginal likelihood fund, on for A. 

(iii) For the case of normal error, 


P N' 

obtain the marginal likelihood explicitly. Dedtribe the value of A that maximizes the 
marginal likelihood. 

12. Consider the simple regression model with additional, uantity A: 

/(£) dE - f lm) ft de { , 

1 i 

Y^OE, 


y W = Hy-.X), 


0 1 0 


’ 2 i' 


, A 



. V>-‘ • 

•A*. ‘T 

% f* 

* : - > -3 ->• 

*?r \ 

. - - 

i. ' 

«. ■ 

' ■' "t X ■' 1 * • 

.-l.iV Pi ^ 

.S* V* p1 
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and y (l) — %:A) is a one-to-one continuously differentiable map carrying y to y [l) (cf. 
Problem 17 in Chapter Three). 

(i) Derive an expression for the structural distribution for 6 given A. 

(ii) Derive an expression for the marginal likelihood function for A. 

13 ( Continuation ). Consider the simple regression model with normal component error 


/(e) de 

(cf. Problem 18, Chapter Three). 


(i) Derive the structural distribution for 8. 

(ii) Derive the marginal likelihood function for A. 


*14. Consider the simple regression model 



Y = 0 X £, 


where 



and A is an additional quantity; note that 8 X relocates Relative to F(A). 

(i) Determine the structural distribution for (3 given A. 

(ii) Determine the marginal likelihood for describe the A value that maximizes the 
marginal likelihood. 


*15. Consider the regression model 



and A is an additional quantity; note that Q x relocates relative to F(A). 

(i) Determine the structural distribution for (3, a given A. 

(ii) Determine the marginal likelihood for A; describe the A value that maximizes the 
marginal likelihood. 
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In some applications a structural model 

f(E) dE, 

X=6E 

may have an error attribution that has symmetries with respect to its 
transformation group G. Let H be the set of transformat, ons that 
the error distribution: 

H = {g: f(g~ 1 E)dg- 1 E =f(E)dE, g e G) 

= {g : /(g-'E) = f(E), geG}; 

The set H consists of the transformations g for which the Variable has 
* . ,, • f' Clparlv if and go are in H, then 

the same distribution as the variable E. Ue y, g s stabilizer 

“ a „d g? are in H. It follows that H ,s a subgroup of G, the stabilize 

subgroup for the error distribution/® ? in the stabilize, 

subgrou^^Th^co'mp'OsiteVansformation applied to a value E ; from *e 

erro g r viable would I. 

fo,"ow T s thafthe transformations % for various „ in H are -ns- 

• i r* tVipqp pouivalent transformations form J 

formations in the group G. These equivalent u 

coset t H of the stabilizer subgroup H (see , Ftgure 0- inde xed 

H. The general inference concern, ng the quant, ty 9 can be expresse 
of the structural distribution: 

g *(Q:X)dQ = k\[XY x X)fSQ^X)J N {^ x ^ A([V]) dv(9). 
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Figure 1 The stabilizer subgroup H for the error distribution; the left coset tH\ a com- 
plementing subgroup i/ 2 . 

All values for 6 in a left coset tH, however, are equivalent. It is natural then 
to present inference in terms of the essential quantity r in the complementing 
subgroup i/ 2 — to obtain the marginal structural distribution for the essential 
quantity r. The general formulation for marginal structural distributions 
was given in Section 8 , Chapter Two. In this chapter some structural models 
with stabilizer subgroups are examined. The marginal structural distributions 
for the essential quantities are derived directly. 

1 A COMPOSITE MEASUREMENT MODEL WITH KNOWN SCALING 

Consider a measurement process on the Euclidean plane i? 2 . Let (e u e a) 
be the error variable with known distribution 

/Oi, e 2 ) de x de 2 

on the Euclidean plane. Let (x u x 2 ) designate a~ measurement; let (pi lt /u t ) 
designate the quantity being measured and cp designate an unknown angle, 
the angle through which an error value (<? l5 e 2 ) is rotated to give the difference 
between the measurement and the quantity. 

The model for a single measurement can be expressed as 

/(fii, e z) de i de 2 , 

" n r 1 0 0 T f 1 

x x = n x cos (c p ) —sin ( 9 ?) e x 
^x 2 J {_ fi 2 sin (cp) cos (cp) J[_e 2 
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The transformation 


6 = p x cos (99) —sin (99) 

p 2 sin (9?) cos (99) 


is an element of the translation-rotation group'. 


1 0 


a x cos (h) —sin (/?) • 

a 2 sin ( h ) cos (/z) __ 


■00 < 0] < co 
0 < h < 2-7T 


The model for « measurements is 


iim*. g 2i) n 

r = 


A = a: 


The matrix X can be viewed as a point in R 2n or as « numbered points in 
j? 2 . Toward defining a transformation variable, consider X as n numbered 
points in R 2 : if the points for X can be carried point-for-point into the points 
for Xby translation and rotation, then X and X are on the same orbit. As a 
transformation variable consider 


[X] — x x cos (h(X)) -sin (h(X)) , 
x 2 sin (h(X)) cos (h(X)) J 

where 

(cos (h(X)), sin ( HX ))) = ) ’ 

r\(X j = (x lx - x x y + (x 21 - x 2 y . 
Applying the transformation [A] -1 to the point X gives 


[A]- X A= r^X) d 12 d ln 
0 d 22 ■ ■ ■ d 2n 
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where 



The transformation [A] -1 translates the n points so that the center of gravity 
is at the origin and then rotates the points so that the vector from the center of 
gravity to the point (* u , x 12 ) is in the direction of the first axis (see Figure 2). 
The model is a structural model for n > 2 (delete points X having x n = 

• X 21 — “*'2)- 

Translation and rotation do not change Euclidean volume. Accordingly, 
the invariant differential on R 2n is the Euclidean differential itself, and 
similarly the invariant differential on the group is the Euclidean differential 
da x da 2 dh. 

The distribution of the error position [E] given the orbit D(E) = D(X ) = 
D is 

/c(D)n/0T + cos ( h ) d u — sin (h) d u , 

1 — 

e 2 + sin ( h ) d u + cos (/1) d 2i ) de x de 2 dh. 

The structural distribution for 6 is 

n 

k (D) n/(cos (cp)(x u - pP) + Sin ( cp)(x 2i _ 

— sin ((p)(x u — pP) + cos (( p)(x 2i — pPP) dp x dp 2 dtp. 

Suppose now that the error distribution is rotationally symmetric about the 
origin : 

/(cos ( h)e x + sin (h)e 2 , —sin {h)e x + cos ( h ) e 2 ) =f(e 1 , e 2 ) 
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for 0 <, h < 2-n. The stabilizer subgroup for the error distribution is the ; 
rotation group ^ 3 

H — 0 cos (h) -sin (h) : 0 < h < 2-rr J . 

j^O sin (/i) cos (h) J ) 

A complementing subgroup in G is the translation group 


ff 1 

0 

(T 


1 

0 

IU2 

0 

1_ 


is obtained by integration : 

k(D) f n/(cos (<p)(*xi — /^i) + s * n 

‘ ° 1 -sin (<p)(x : u - A*i) + cos (vX** “ d<? dfXl df * 2 

= 2rrk(D) ft /(*ii “ i“i> * 2 i ~ ^ 2 ) 

1 

2 THE MEASUREMENT MODEL ON A SPHERE 

Consider a geologist measuring the direction of crystallization in 
a certain rock structure, or a geophysicist measuring e * r radio 
magnetic field, or an astronomer measuring the direc ion o unrecorded 
waves. The quantity being measured is a direction. A direction can be recorded 
as a unit vector in as a point on the unit sphere. The measurement model 

on the sphere can be used to analyze the data. , • m 

Let e = fe, e,)' be the error variable, a point on the unit sphere in * ■ ■ 
And suppose that e records error in relation to a reference direction i (1, 0 ) 

in «* (see Figure 3). Also suppose that the irror iutnbumn has been 
identified except perhaps for an additional quantity k . 

f{e^e^,e 3 -K)de\ 

the vector e is. restricted to the unit sphere, and the differential de m 

area on the unit sphere. „ , . , v ■ in- 

The physical quantity is the general direction o t e proper y ^ 

vestigated and the angle of rotation applied to an error va ue. . 
quantity describing the general direction and the error ro a ion . 


(<*>!, U>2j m3). 



"l 2 

"13 

t0 21 

"22 

"23 

"31 

" 32 

"33 
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The model is 

Measurement Model on the Sphere 

71 71 

/(£: k) dE = n /C e ii> n de i’ 

The model has an error distribution describing the 

process, and it has a structural equation in which a realized l error ■ E s 
determined the relation between the measuremen X and the > quan y 6 
For n > 2 the model is a conditional structural model with additional 

quantity k. 

Consider the effect of a transformation g . 

gX = g(x x , . • • , x„). 

The transformation g rotates the n points on the surface of the sphere but 
maintains their relative positions. Toward defining a transformation variable 
let o x (J) be a unit vector in the direction of the sum vector 



where 

p(X) = (2 *u) 2 + (2 + <2 x 3i) 2 - 

Let o 2 (X) be the unit residual vector for x x after regression on S? And ] let 
o 3 m be the unique unit vector that then completes a right triad of three 
orthonormal vectors. As a transformation variable consider 


[X] = (o x (X), o 2 (X), o 3 (Z)) = 0(1). 

The application of the transformation [X]- 1 to the point X gives 


[ X ]-iX = (d x (X), . . . , d n (*)) 

'd n (X) d 12 (X) ••• d ln (Xy 
= d 21 (x) d 22 (X) 

o d 22 (X) ■■■ d Zn (X)^ 
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the sum vector for D(X) is 

n n 

J,UX) = I [XTx, = [Xr%X) o x (X) 

1 1 



The transformation [X] -1 carries the original sum vector into a vector along 
the reference vector (1,0,0) and carries the first vector x x into the plane of the 
first two axes. The point D(X) is a reference point, and the transformation [X] 
gives the position of X relative to the reference point. 

The Euclidean area elements on the surface of the sphere are invariant 
under the rotations. For a Euclidean volume element dO on the group 
let do 1 be Euclidean area on the unit sphere in i? 3 for o x ; and let do 2 be 
Euclidean length on the unit circle in R 2 for o 2 (a unit vector orthogonal to 
o x ). The differential ffo x do 2 is invariant under left and right matrix multipli- 
cation by positive orthogonal matrices. 

The conditional distribution of the error position [£] = 0(E) given the 
orbit D is 

g K (0:D) dO = k K (D) flfm-.K) do 1 do 2 . 

i 

For an assumed value for the quantity k the reduced structural model is 

g K (0:D)d0, 

0(X) = 60. 

The structural distribution for the rotation 6 conditional on the quantity 
k is 

71 

k K (D) n/( w i x i> w 2Xi, wX-:*) doi x doi 2 , 

1 - 

where for example copq is (to x , x t ), the inner product of co x . and x f . 

The marginal probability element for D is 
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3 THE MEASUREMENT MODEL ON THE SPHERE: NORMAL ERROR 
A normal error distribution for the sphere has been proposed: 

f(e i, e 2 , ep.K) de = * exp (kcJ de. 

4tt sinh (k) 

The normahzing constant is easily checked: 

Jexp {kcJ de = J exp (k cos (e)}2v sin (e) de 

/*+l 

= I exp {/ct}27r dt 

277, , , t 4tt sinh(K) _ 

= — (exp {*} — exp k\) = ‘ , 

k x 

the variable e is used to designate the angle between e and (1,0, 0). The 
quantity k describes precision: with k = 0, the distribution is uniform on the 
sphere; with k large the distribution is concentrated near (1, 0, 0) (compare 
with the normal example in Section 5, Chapter Four). 

The conditional distribution of the error position 0(E), given the orbit 
D, is 

K n l ” \ 

g K (0 : D) do = kf.D) . , ■ ■ — exp X K (°u. o ia , o 13 )d4 do 1 do 2 

(477 smh (k)) ( r j 

= k KX ) eX p ( K /(X)o u } do t • — 2 . 

477 sinh (kI(X)) F 1 U 277 , . 

The conditional error distribution has two components : The distribution 
of o x is a normal distribution on the sphere with precision kI(X ); the distri- 
bution of o 2 is uniform on the unit circle orthogonal to oq. 

The structural distribution for the rotation 6 conditional on the quantity 
k is ' 

exp { xl(X)(co n o n (X ) + co 21 o 21 (X) + oJ3i°3i(20)} dus 1 • 

4t 7 sinh ( kI(X )) 2tt 

= 3 : K uT ) |/vC x eXP {^^K 0 ^ 1 )} do3 l • • 

477 sinh ( kI(X )) 277 

The structural distribution for ojj is the normal distribution with precision 
kI(X) as relocated in the direction o x (X) of the sum vector 2" x t ; and the 
structural distribution for a> 2 is uniform on the unit circle orthogonal to oj x . 
The marginal likelihood function for k is 


4t 7 sinh (kI{X))2tt /C 


(477 sinh (/<))” 


sinh ( «l(X )) 
sinh" ( k ) 
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The marginal likelihood function depends on the orbit D — D(E) = D(X) 
but only in terms of the length / = 1(E) = l(X). Correspondingly, the dis- 
tribution for the orbit D = D(E) as it depends on k involves only the 
length / = 1(E). It follows then that the general distribution for /, 

h(l\K)dl, 

can be obtained from a special distribution such as 


by likelihood-modulation : 


h(l:0) dl 


(sinh (/<Z)/sinh" (k))k u 1 

h(l:K) dl = - ^ h(l:Q)dl. 

(Compare with Section 4 in Chapter Four.) The distribution of 1(E) for 
a uniform distribution on the sphere is available from probability theory: 

h(i,0) di = ^r sin,1(,)sin(ll) dt di = <pm dl. 

77 I **- 1 


as the distribution of the length of the sum of r random unit vectors in R 3 ; 
the function <p n (l) is 

*. v > = - 1 - 2 ‘) r . - 

( n — 2) ! *=o \s / 


(0+ = t, if t > 0, 

= 0, if t < 0. 

The general distribution of / is then 

k , :k) di = !ii *M) 2. r is m dt . dl 

sinh u (k) 77 Jo t n 1 

= 1 (i)d; . 

sinh"(/c) 2"- 1 

4 THE MULTIVARIATE MODEL f 

Consider a system with p response variables y x , . . . , y v . Suppose the internal 
error, as it affects the responses, has been identified and can be described by 


| The analysis of the multivariate model (Sections 4, 5, 6, 7) depends on definitions and 
notation in Sections 10 and 11 of Chapter Three. The multivariate model may be omitted 
on a first reading; the model, however, has a central place in mathematical statistics. 
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p error variables e l5 . . . , e v with a known distribution on RP. Let fj u . . . , 
be the general levels for the p response variables. And suppose that the error 
variables affect the response levels by linear distortion: for the y th response 
let y jr be the coefficient applied to the / th error. A realized error vector and 
the corresponding response vector are then connected by the equations : 

Vi - Hi + Yu e i + • - ' + Yip e v 


Vv — Rp + Yvi e i + ■ ' * + Yvp e i» 
or by the matrix equation 



Now consider n performances of the system and let y x — (y u , . . ■ , 2/i n ) 
be the observations for the first response, . . . , and y v = (y v l5 . . . , y vn ) be 
the observations for the pth response. The system and the n performances 
can then be described by the 

Affine Multivariate Model 


TI/( e li> • • • » e pi ) XT ' ' ' d' e pi-> 

X 1 



The model has an error distribution with ... ,e p as variables; and it has 
a structural equation in which realized errors e', . . . , e p have determined the 
relation between the observations and the quantities. 
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The affine multivariate model can now be written: 



f(E) dE, 
Y = dE. 


The transformation 9 is an element of the positive affine group on R p : 



— co < Clj < oo 

— co < c jr < oo 

|C|>0 
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a C a* C* 


a + Ca* CC* 


■ C- 1 a C- 1 


The matrix 7 can be viewed as a point in R pn or, more conveniently here, 
as p points y l5 . . . , y p in R n . A transformation g carries 1 and y l5 . . . , y„ into 
1 and p vectors in L( 1, y x , . . - , y„)- The /? new vectors in .1.(1, ■ Yi» • • • > Y*) 
are generated by a matrix with positive determinant; accordingly the /> 
new vectors with the 1-vector have the same orientation as do the/? original 

vectors with the 1-vector. . 

Now suppose that n>p + 1 and that trivial observations 7 with 1, 

y , y linearly dependent are excluded. Let ZA(1, yi, • • ■ ■> y?) ® e 

(»’ + i. {-dimensional subspace L( 1, y x y») together with an orientation, 

the orientation of the p + 1 vectors 1, y l5 - • ■ , y„- A transformation g 
carries the vectors y x , . . . , y„ of a nontrivial Y into new vectors % . . ., y„ m 
the same subspace L+( 1; y lf • . • . y.) and with the same orientation relative 

to the 1-vector. . 

The definition of a transformation variable can be facilitated by notation 

from Section 10, Chapter Three: 


' 1 J 0 " 
^ m(Y) ] T(Y) ^ 


. mJY) i tJY) 


s (1 )(T) 

t 21 (Y) s w fY) 


t p p -i(Y) s m (Y). 


d *'<Y) dUY) 


dtrSJ) 






D*(Y) = 


The Multivariate Model 


229 



§* 

(an asterisk is added to the D-matrix of Chapter Three to distinguish it from 
a reference point defined in this section). The second matrix D*( Y) contains 
vectors d*(7), . . . , d*(7) obtained by successively orthogonalizing and 
normalizing the vectors y x , . . . , y„ in the sequence 1 , y l5 . . . , y P ; the first 
matrix contains the coordinates for the row vectors in 7 using as a basis the 
row vectors in D*(Y). 

Consider some additional notation. Let y®, . . . , y® be the projections of 
(1,0,... , 0), . . . , ± (0, . . . ,0, 1,0,...) into the subspace L{ 1, y x , . . . , y p ), 
the sign of the last vector being chosen so that y®, . . . , y® have the same 
orientation! as y 1} . . . , y„ in L + ( 1; y x , . . . , y^); let 7° be the corresponding 
-natrix 



and let 

D(Y) = D*(Y°). 

The matrix D{ 7) contains p orthonormal vectors with appended 1 -vector. The 
matrix D{Y) depends only on the oriented subspace L + (l ; y{,'. . . , y^), 
and not otherwise on the observation 7. 

Now take D(Y) as the reference point on the orbit described by L + (l; 
y x , . . . , y . p ) ; and let [ 7] be the positive affine transformation that carries 
the row vectors of D(Y) into the row vectors of 7: 


1 ; 0 • • • 0 

m i(Y) j c u (7) ••• c lv (Y) 

Y — [7]D(7) = ‘ | D(Y) 

v- m jj(7) i c, pl (Y) ••• Cj ,,(7L 



0 

C(Y) 


D(Y). 


f Exclude for convenience of definition further trivial Y with 1, yf, .... y® linearly 
dependent. 
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The affine multivariate model can now be written: 

f(E) dE, 

Y=[Y]D(Y), D{Y)=D{E). 

For n > p + 1 the affine multivariate modelj is a structural model. 

The "transformation [7] can be expressed in terms of triangular and 
orthogonal components. Let [7] be the transformation variable defined 

for Section 10 in Chapter Three, 



t [m (Y) T(Y) J 

then 

Y = [Y]D*(Y). 

T 

And let [Y] be the positive orthogonal matrix that generates the row vectors 
of D*(Y ) from the row vectors of D(Y)\ 

r 

1 0 

[ Y ] = ; 

o 0 0(Y) 


Y = [Y]D*(Y), D*(Y ) = [Y]D(Y), 

T ^ 

y = [yp]fl(y) = [y]d(y). 

T O 

It follows that the transformation variable can be factored, \ 

[Y] = mm; 

T O 

or equivalently 

m(y) = m(Y), C(Y) = T(Y) O(Y). 

Note: The factorization C(T) = T(Y)0(Y) isttie positive lower triangular- 
orthogonal factorization described in Section 6, Chapter Three. 

*5 THE MULTIVARIATE MODEL: DISTRIBUTIONS 

Consider the invariant differential on the response space. A transformation 
g applies column-by-column on the matrix Y. Its effect on the fth column is 


a + C 


j - Excluding trivial points Y with linear dependence among the rows. 
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which has Jacobian |C|. Hence 

J v n ( g -- Y ) = \cr=\g\”, 

j V n(Y) = icmi” = imi n , 

dm(Y ) = TI . dy ii = -iOL- 

i c(Y)\ n imr 

Now consider the invariant differentials on the group; 


10 10 10 
a £ a C a * C* ' 

The left transformation operates column-by-column. For any given column 
the Jacobian is |C|; hence 

j = icr 1 , 

j(g) = i*r\ 

dg 


The right transformation operates row-by-row. For any given row the Jacobian 
is | C * | ; hence 

J* = |c*|* 

J*(g)= I g\ 9 , 

_ dg 


dv(g) = 


The modular function is 


!gP _ 1 
lgP +1 |g| 


5.1 General Distributions. The conditional probability element for the 
error position [E\ given the orbit D(E) = D is 

g([E]: D) d[E] = k(D)f([E]D ) |[£]| n -®- 

|[£]| 3H " 1 


= KD)Ufi m + C 


| Cl"- 3 ’- 1 dm dC. 
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The structural probability element for 6 given Y is 

«*(« :Y)de = k(D( n)/(e-‘v) ^ *(9) 



KOOO)II/ 


Vu ~ Ei 


ypi Ep 


!C(Y)r- x dp. dT 

irp irp 


The error distribution provides the basis for tests of significance; the struc- 
tural distribution provides the basis for general inference. 

5.2 The Semidirect Decomposition. The structural equation for the 
affine model, 

m (y)= pi + Tm(E), 

C( Y) — r C(E), 

can be separated into a part concerning the general level p. and a part con- 
cerning the error scaling V : 

C- l (Y)(m(Y) - (x) - C- 1 (E)m(£) = t(E), 
r-iC(Y) = C(E). 


The general level p. relates to the location subgroup 
L — {[a, /]: a 6 R»}, 

and the error scaling T relates to the positive linear subgroup 
S = {[0,C]: ICi >0}. 

These definitions use the general location-scale notation of Problem 27 in 
Chapter One. The general group element can .be expressed uniquely as a 
product, location-times-scale: 

[a, C] = [a, /][ 0, C] = [a, C][ a, C], 

L S 

or uniquely as a product, scale-times-location. 

[a, Cj = [0, C][C _1 a, /] = [a, C][a, C}. 

S L 

(See Problem 19 in Chapter Two.) 

5 3 The Location Distributions. Consider the quantity p. in relation to 
the positive affine group. Specification of p. restricts the general quantity 
to the left coset of the subgroup 5: 

{[p., r]: 1 rt > 0} = {(p., 7] [0, rj: |T| > 0} = [p ;!] s - 
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Tests of significance and the marginal structural distribution are then leased 
on right cosets on the error space G*. For example, the information p. = p. 0 
leads to the value of the error characteristic t(E), 

t(E) = C-'(E)m(E) =.C- 1 (Y)(m(7) - p. 0 ); 

and this value of the error characteristic t(E) = t restricts [m(F), C(E)] to 
a right coset: 

{[m, C]: C- 1 m = t) = {[0, C][C~ X m, /]: C^m = t} 

= S[t, /]. 

See Section 6 in Chapter Two for the univariate case p — 1. 

The full error probability distribution, 

k(D)f([E]D) |C|"- p - 1 dm dC, 
can be reexpressed by the substitution m = Ct, 

k{D)f{[E]D)\C\ n -^\C\dtdC, 
and the marginal distribution for t obtained by integration : 


g L (t: D ) dt = k{D) 


f* n 

n / 

JC 1 


h + d u 


icr- dC ■ dt. 


\ £p + d pi J / 

The structural distribution for p. can then be obtained by substituting 
t = C _1 (T)(m(7) — p.); 


gl( v .:Y)d v . = k Jfff f n/j CC-\Y) 

cry) Jo i 


Vli P'1 


Vvi P'ji 


\C\ n - p dC ■ dp.. 


5.4 The Scale Distributions. Consider the quantity F in relation to the 
positive affine group. Specification of T restricts the general quantity 6 to a 
left coset of the subgroup L : 

{[P-, F]: p.ei?*} = {[0, r][r-V,/]: rieil*] 

= [0, T]L. 

Tests of significance and the marginal structural distribution are then 
based on right cosets on the error space G*. For example, the information 
T = T 0 leads to the value of the error characteristic C(E), 

c(e ) = r^c(y); 
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and the value of the error characteristic C{E ) = C restricts [m (£), C(£)] 
to a right] coset: {K C]:C} = {[m> /][ 0 , C):C) 

= L[ 0, C], 

The marginal distribution of C can be obtained by integration : 


g s (C: D) dC 


k(D) \ n/j 

J ml j 


m T C 


dm - ICr*- 1 dC. 


The structural distribution for T can be obtained by the change of variable 
C— F^CiY) in the preceding distribution. The Jacobian evaluation for 
this can be avoided by using the evaluation implicit in the derivation of the 
full structural distribution. The marginal distribution of T can be obtained 
then by integrating out the quantity p. in the full structural distribution: 

/ Vu ~ Ah \ 

= r- ; j rfjwr. 

\ \y*i- J / 

*6 THE MULTIVARIATE MODEL: ROTATIONAL SYMMETRY 

Suppose now that the error distribution is rotationally symmetric with 
respect to the rotation group. 


Gn= 1 h = 


”pl ‘ * * U PV- 


1 0|. O'O — I 

o oj : \0\ = 1 


heG n . 


"j" For the location subgroup left cosets are right cosets and right cosets are left cosets, 
the location subgroup is a normal subgroup. 
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The rotation group is the stabilizer subgroupf for the error distribution. 

A complementing subgroup is the location-progression group examined in 
Section 9 of Chapter Three: 


— co < a 3 < co j 

— co < k jr < oo ) . 

0 < c <oo 


The analysis at the end of Section 4 shows that an element g of the positive 
affine group can be represented uniquely as a product: 


g = kh= [£][g] 
T O 


[k = [g] eG T , 
I T 

I h = [ g ] e G 0 . 
o 


The invariant differentials for the progression group (Section 11, Chapter 
Three) are 


dfXrpiJd) — 


dv T (k) = 


A r (/c) = ^i- v . 

I k\ A 

The invariant differentials^ for the rotation group are 

djj, 0 (h) — dh = dO, dv a (h) = dh — dO, 
A 0 (h) = 1. 


The composite differential 


dfi (k) dv 0 (h) 


for the variable kh is invariant under left multiplication by elements of G T 
and right multiplication by elements of G 0 . The adjusted differential 

dju(kh) 

m 

t The integrability of the density can be used to show that no further symmetries are 
possible within the positive affine group. 

t The differentials for an orthogonal matrix are composed of differentials on the surfaces of 
spheres; these differentials are invariant under orthogonal transformations; cf. Section 2. 
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also has these invariance properties. At the identity the component 
differentials dp x (k) and dv Q (h) both measure Euclidean volume, one orthog- 
onal to the other. Hence 

dp(kh) = A(/z) dfx T {k) dv 0 (h), 

dg _ d(kh ) _ dk_ 
i p+1 ~ \kh\ v+1 


■dh. 


| gr 1 |/cfir +1 |fc| A 

The quantity 0 can now be separated into an essential quantity [0] and a 
redundant quantity [0] : 

0 [0] e G t 

e = mm: t 

T ° [0] 6 G 0 , 

o 

where 



[ 0 ] 

T 


[0] = 

O 


1 

0 0 


“\ 

0 

Rx 

a (l) 


0 

>2 

r 21 °"(2) 




T pl ‘ r vv- 

l a (v)^ 

"l 

( — 

o 

o 



0 

a) n ■ 


1 1 

. 



o i 




V- i 

_0 

jx * ‘ 




1 

0 

{A 

G 


note that T - GO.. And the structural distribution for 0 can then be ex^ ^ 
pressed in terms of the component quantities: 

k(D(Y))W- l Y) 

lmr -> din 
T 

I r V 1 i ” — 1 

-WVf'r) 1 F .^-W 0 

T T 


m 


§7 The Multivariate Model: Normal Error 

The redundant quantity can then be integrated out: 


237 


sim- r) dm = kiomgaer'Y) H ? 1 

T T T | [0] 

T 


\ I 71— X J n 

>] 2L_.|d£2 

\mL J 


v I 7YY3I” - * 

= 114 k(D( Y))f ([0]- x y) ------ 21_ . 

> * mr 1 i[ 0 ]u 

T T 

the integration is performed in the pattern in Section 3: the first-column 
vector in Q ranges over a unit sphere in R p ; the second conditionally over a 
unit sphere in i? 33-1 , . . . ; the last unit vector conditionally is determined 
(Aj is the area of the unit sphere in R j ). 

The essential quantity [0] can be expressed in terms of the location and 
scale components: T 




r 

l 

0 

O 

O 



Ai 

a (n 

0 


[0] = 


t 21 

(7 (2) 

V-ij 




T 




: 





. 



t d1 

T J> 3)— 1 JJ> 


1 

0 

1^ 

G 


The structural distribution can be expressed in terms of these components : 

Vh ~ f*i 


l 


' 'l 


Vv 


Rv 


\ TjYT- 1 d{L dG 

1-Er 1 |**5| ITSIa 


*7 THE MULTIVARIATE MODEL: NORMAL ERROR 

Consider the affine multivariate model with standard normal error vari- 
ables: 

/(£) dE = (2„r""' s exp {-i 2 e' H ] n de H , 
tn -»[£). D(Y)=D(E). 
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is the adjusted position variable of the location-progression group (compare 
■with Section 12 in Chapter Three). The sum-of-squares can then be expressed 
in terms of the triangular components: 

2 4 = tr [£][£]' - n = tr [£][£]' - n 

T T 

= n tr m(£)m'(£) + tr T(E)T'(E ) 

= (I »e1) + (I <?,■(£)) + (2 sf lt (£)). 

7.1 General Distribution: Error. The distribution of the error position 
[. E ] given the orbit is 

g([E]:D) d[E ] = /c(D)(27r)-” p/2 exp {-|(tr [£][£]' - n)} [[£]]" -^ 

J Jl | [jg] j 3>-+-l 

The differential can be factored by the formula in the preceding section : 


JIE . L = J _ JrFl = dm (E) dT(E ) 
Pir 1 I[£]Ia o |T(£)| |T(£)| a 


The distribution of [£] can then be expressed in terms of the components 
m(E), T(E), 0(E): - 

g([E]:D) d[E] = k(D)(2ir)~ n:p t 2 exp {-J(tr [£][£]' - n)} 


\T(E)\ n 
\T(E)\ | T(£)| A 


dm(£) dT(E ) dO(E) 


exp w 1 " e1 " * 2 ~ * 2 4 >} 

• ■ • ssr 1 n *, it dt„. n 

n-4, 


This distribution describes a collection of independent variables: standard 
normal variables, chi-variables, and uniform-on-a-sphere variables. It 
should be noted that the triangular variable [£] with components m(£), 

T 

T(E) does not describe the usual right cosets on the error space G*; it de- 
scribes left cosets of the orthogonal group. The distribution for m(E) and 
T(E) agrees with that obtained in Section 12 of Chapter Three. 
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7.2 General Distribution: Structural. The structura! distribution can be 
obtained from the general formula in Section 5.1 together with the ^factor- 
ization of the differential (Section 6) and the normalization constant (Section 

7.1): 

g%6: Y) d» = k(D(Y))f(6^Y) <W«) 

_ inrilll^exp {-Utrid^YY'd" 1 ) - n)} 

“ (lit)™ 12 ^ 

„»/* |T(Y)r _1 diitfodO. 

'777 rer 1 itsipsl ' 

li 

The sum of squares in the exponents can be re-expressed. 
y e 2 . = tr EE' - n = n tr mm' + tr CC 

-ntrI^(n<y)-|iX^-l‘)' r ^ + te r^WW'- 

= n(m(Y) - ^'(rrO-^mCY) - P-) + tr(IT') 1 C(Y)C'(Y) 

= n(m(Y) - p.)'£ _1 ( m ( y ) ~ + trS “ 1S ( y )’ 

where two inner-product matrices are defined by 

s = rr = 75QQ , 75' = 7575 ', 

S(Y) = C(Y)C'(Y) = T{Y)0{Y)0'{Y)T{Y) = T(Y)T’(Y) 

fjtn-Sl ■" ••• 



- y, • • • Vvn -Vv\ IVvi -y* ' ' ' • y»n - y* J 
The structural distribution is 

g*(0:Y)d0 

= exp {-K-cn - - I 1 ) - * tr2 " s(y)> 

(2tt) 

n 7> /2 |S(Y)| ln-1)/2 dp .d75 dO 

' x rG[ n-1 XXTsu 

riA, 11 

2 

= X 2 lgi± exp (-Um(Y) - p.)'nS- x (m(Y) - p.)} dp- 
(2tt) p ^ 

a !S(Y)I (T *" 1)/2 dTS dQ. 

exp (-i trS-^V)} T5T ' 1 ■ 

1 |75| l^U J^[ z4 3 
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The conditional distribution of p. given 75 is normal with mean m(Y) and 
covariance matrix n _1 S = n~ v G r G' \ the marginal distribution of 75 is given by 
the middle expression; and the distribution of O is uniform on the positive 
rotation group, and is independent of p., 75. 

7.3 Location Distribution : Error. The distribution of the error component 

r m i( E ) i m 


UJ 

can be obtained from the error distribution (Section 7.1): 

— — exp {— |£n<? 2 } II dJn e 3 -. 

(2tt) 3,/2 1 V 3 

It should be noted that the error component m(ii) does not describe the usual 
right cosets on the error space G*; it describes left cosets of the positive 
linear group S'. 

The distribution of the usual error location characteristic 
t = t(E) = C~ 1 (E)m(E) 

can be obtained from the formula in Section 5.3 together with the integra- 
tion properties for the distribution of [E\ in the normal case. 

7.4 Location Distribution: Structural. The marginal structural distri- 
bution for the location quantity p. can be obtained from the final expression 
in Section (7.2) by integration : 

*. y \A - 2 A *~ l ’ ’ ' A "-v 

P- (27r)p/2 ( 2w y*-i)i,/« 

•J^exp {-i tr S“ 1 (S(Y) + n(m(Y) - p.)(m(Y) - p.)')} 

| S (Y)| ( n-l)/2 ^75. .. 

. #IX ...... 

|75|" !75| a ^ 

= n^A n _ x • • • A n _ v |5(Y)| ( ”- 1)/2 d 

A n --- A n _ v+1 |S(Y) + n(m(Y) - p.)(m(Y) - f x)'|" /2 
. f A n ■ • • T n _ P+1 _ f t + „ v _ lc ,x |S*r /2 |d75| 


. f A * ' : ; A n- 

JT5 (2ttY v I' 


■ - n ~ v+x exp {-i trS _1 S*} 1 

(2 tt)™^ 2 f |75| n |75 | a 

„ 0/2 


A n |S( Y)|' 


-1/ + nS-\Y)(m(Y ) - p.)(m(Y) - p.)r” /2 ^ 


(l + "«n - y-ys-WY) - y))- n,1 dy; 

**■71 I ^ / I 
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the integration property associated with the marginal distribution of 75 
(Section 7.2) shows that the integral in the middle expression has va ue one; 
also check in Chapter Three (Section 8.4 and Problem 40). The marginal , 
distribution of p. is a multivariate t-distribution but relocated and rescaled. 

7.5 Scale Distribution: Error. The distribution of the error component 
C(E) or the equivalent components T(E) and 0(E) can be obtained rom he * 
error distribution in Section 7.1 : ' 


(27r) " _I P 3 


, , dO 

si if • • • sf P r -1 n d ur n ds u) ■ - — 


This is a distribution describing right cosets on the error space G*. The 
marginal distribution for T(E), however, does not describe the usual right 
cosets; it describes left cosets of the rotation group. 

7 6 Scale Distribution: Structural. The marginal structural distribution 
for ’the scale quantity T is directly available from the last expression in 

Section 7.2: . 


gt(V:Y)dV= 


1 exp {-|tr 2 _1 S(Y)} 


|S(Y)j (n_1)/2 975 dQ. 


A related quantity of interest is the covariance matrix S - 75 ^ The 
Jacobian to obtain the distribution of S is available from Section 12. m 
Chapter Three : 

— = 2 V |75(v. 

The structural distribution of S is ' " 


A 7i — X ^1 71 

( 27r ) ( n-DW2 


exp(- 2 trE b(Y)j lS((n _i )/2 2*|75| A rG| v 


A n — 1 

2 P (2tt) 


..A , |S(Y)| (n_1)/2 , v 

exp {-1 trS-'SCY)} < E - 


NOTES AND REFERENCES 

Marginal and conditional structural distributions were examined in 
Section 8, Chapter Two. The marginal analysis in this chapter has evolve 
from an analysis of these distributions for essential and redundant qua 
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The normal error distribution on the sphere was proposed by Fisher 
(1953) in an analysis of measurement on the sphere. As part of his analysis 
he derived the marginal distribution of the length l = 1(E). The conditional 
distribution of the length / = 1(E) given the orbit is independent of the orbit 
(Section 3), hence is equal to the Fisher marginal distribution. Watson and 
Williams (1956) used the likelihood-modulation method to simplify Fisher’s 
derivation: The special distribution of / = 1(E) for k = 0 had been obtained in 
another context (Lord Rayleigh, 1919); the likelihood function adjusts the 
special distribution to give the general distribution. A survey of results for 
the normal distribution on the sphere is given in Stephens (1962). 

The multivariate regression model has been analyzed by Fraser and M. S. 
Haq. 

Fisher, R. A. (1953), Dispersion on a sphere, Proc. Roy. Soc. {London), A217, 295-305. 
Lord Rayleigh (1919), On the problem of random vibrations and random flights in one, 
two, and three dimensions, Phil. Mag., (6) 37, 321-347. 

Stephens, M. A. (1962), The statistics of directions, the Fisher and von Mises distributions, 
Ph.D. thesis, University of Toronto. 

Watson, G. S. and Williams, E. J. (1956), On the construction of significance tests on the 
circle and the sphere, Biometrika, 43, 344-352. 

PROBLEMS 


1. Consider the composite measurement model with known scaling (Section 1), and 
suppose that the component error distribution is standard normal : 

(i) Derive the distribution of error position [£]; for simplicity in the final expression 
let 

i\x ) = 2 (*1* - *i) 2 + i (*w - *2> 2 - 

l i 

(ii) Derive the structural distribution for (u x , /< 2 , <p); derive the marginal distribution of 
0<i» Pz)- 

2. Consider the measurement model on the plane with unknown scaling and unknown error 
rotation : 

7i n 

J_ X /( e it> e z ») 1 1 (de li de 2i ), 
l l 

r ~\ 

1 0 0 

X= \x x a cos (<p) — o sin (cp) E, 

/j. z a sin (f) cr cos ( <p ) 
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and n > 2. Analyze the model following the pattern and notation in Section 1. 

(i) Check that the transformation 4 f0 ™ a d ^ ine the refere nce point. Show that the 

(ii) Define a transformation variable, and determine f Problem 1 

model is a structural model. As a scale variable S (X) consider l(X) from Problem 1. 

(iii) Derive the invariant differentials and the modular function 

he distribution for error position and the structural distribution for 6. 

( (v) ^rotational, symmetric error distribution determine the marginal structural 

distribution for (p x , M 2 > °0- 

3 (Continuation). Consider the preceding measurement model and suppose that the 
component error distribution is standard normal. 

/<«„ <y! = 2 exp {— i(e? + Ay, 

(i) Derive .he distribution of the error position [£] ; for simplicity in the final expression 

use „ 

s\X) - i (.„ - A>‘ + 2 "y - *•)*• 

(ii) Derive the s.ruc.nrai distribution for (ft. ft- marginal distribution 

of (p x , P 2 )- 

4. Extend the model in Section 1 to cover measurements (* x , .... *„) on quantities 
( Pv • • • > Fp)' n n 

f(E ) dE = n A e if • • • » ^ n 
1 1 
X=0E, 


1 

Q ... 0 

Pi. 



O 

Pp 



the quantity D is an element of the rotation group on and n > p. The group 
rotations on R v is 


Go - J 0 


\^ U pl 

(o x , . . . , Op) 


O'O = J, 

\o\ = 1 
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An element O can be described as follows: the vector Oj is an arbitrary unit vector in R v 
and j do l = A v \ the vector o 2 is an arbitrary unit vector in the (p — l)-dimensional space 
orthogonal to o x and j do 2 = A v _ x conditionally; . . . ; o J) _ 1 is an arbitrary unit vector in 
the 2-dimensional space orthogonal to o x , . . . , o J) _ 2 and J do v _ x — A 2 — 2n conditionally; 
Oj, is the unique unit vector orthogonal to o x , . . . , o 3) _ 1 such that o x , . . . , o„ have the same 
orientation as the coordinate axes in R p . 

(i) Check that the transformations form a group, the location-rotation group. 

(ii) Define a transformation variable and determine the reference point. Show that the 
model is a structural model. 

(iii) Derive the invariant differentials and the modular function: Represent dO as 
do 1 ■ ■ ■ do v subject to the constraints in the description of an element O in G. 

(iv) Derive the distribution for error position and the structural distribution for 6. 

(v) For a rotationally symmetric error distribution determine the marginal structural 
distribution for (p 1 , . . . , p p ). (Compare with Problem 8 in Chapter Four.) 

(vi) Suppose that the error distribution involves an additional quantity (3. Obtain an 
expression for the marginal likelihood function. 

5 ( Continuation ). Consider the preceding measurement model and suppose that the error 
distribution is symmetrical normal : 




(i) Derive the distribution of the error position [£]; for simplicity in the final expression 
let 

l\X) =21 <*« - *,) 2 - 
i i 

Derive the marginal likelihood function for a. 

(ii) Derive the structural distribution for p x , . . . , p v , Q ; derive the marginal distribution 
of (p-x, . . . , p v ) (compare with Problem 9 in Chapter Four). 

6. Extend the model in Problem 2 to cover measurements (x x , . . . , x p ) on quantities 

(Pi> • • • 1 Pp)- 

f(E) dE = n f(e u , e pi ) JJ (de u • • ■ de pi ), 

1 1 

X = dE, 

where 



the quantity Q. is an element of the rotation group on R v , and n> p. 

(i) Check that the transformations form a group. 

(ii) Define a transformation variable and determine the reference point. Show that the 
model is a structural model. As a scale variable s(X) consider l(X) from Problem 5. 
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(iii) Derive the invariant differentials and the modular function. . - n 

M the marginal StrUCtUial 

“ tpp- the'error dSriLon involves an add.tional quantity 0. Obtain an «p.es- - 
sion for the marginal likelihood function. 

7 (Continuation). Conside. the preceding measurement model and suppose that the error 
distribution is standard normal : 

/( e l> • • • > e p) = (2 t t )JV 2 eX P ^ ^ 

(i) Derive the distribution of the error position [£]; for simplicity in the final expression 
use ... 

(ii) Derive the structural distribution for (^, • • • , Q )> derive the marginal 

distribution of (/tj, • • • > Mp) - 

8. For the following symmetrical normal distribution in B?, 

ft*,.,:, -■ - -rT:"' ! h' ’’ ‘ + + 4 

9. Consider the positive linear group 


G= {<? = 


: 1^1 > 0 


operating on points 


Vn ' ‘ • 2/m 


in Euclidean space X = R* n by matrix multiplication: j 

Y = gY. 

(i) Check that G is a group and that the group is unitary on X provided n>p and 

“Sf define a transformation variable m - «*- derive 

w m tne P dlleiu , A . ,_ ft arif i r ;„ht invariant differentials on G. 

‘"m " l o°“.he “va^i differentiL deduce ,he value of .he Jacobian 


.. . 
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the Jacobian for the progression group is different (compare with Problem 29 in Chapter 
Three). 

10 ( Continuation ). Consider the linear multivariate model for n observations on p responses: 

f(E) dE, 

Y = TE, 

where T is an element of the positive linear group. 

(i) Derive the distribution of the error position variable [£]. 

(ii) Derive the structural distribution for F. 

11 ( Continuation ). Suppose the error distribution is rotationally symmetric: 

fiO-'E) = /(E) 

for all rotations in the group 

°n ‘ 

O — 

c. °J)1 ’ " °PP 




A complementing subgroup is the progression subgroup of positive lower-triangular 
matrices 



r = |T][r] = - go , 

T 0 


(i) Derive the marginal structural distribution of 75. 

(ii) Express the preceding distribution in terms of the equivalent quantity 

s = ■bt 3 / = rr , > - 


. 12 ( Continuation ). Consider the case of standard normal component error, and let 

[E] = C(E) = [£][E] = T(E) 0(E). 

T O 

Derive 


(i) the distribution of [E] given the orbit in terms of T(E), 0(E)\ 

(ii) the distribution of T(E) given the orbit (not a right coset distribution)’, 

(iii) the distribution of the error inner-product matrix S(E) — EE' = T(E)T'(E) given 
the orbit (not a right coset distribution ) ; 

(iv) the structural distribution of T; 

(v) the marginal structural distribution of T5 ; 

(vi) the marginal structural distribution of S. 
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13. Consider the affine multivariate model with rotational symmetry. For tests of signif- 
icance the decomposition of error position is needed in the order. 

[E] = [ERE] 

0 T 

(cf. Sections 6, 7 in Chapter Two). Derive the marginal distribution of the error position 
[£] which indexes right cosets on the error space G*. 

T 

14 ( Continuation ). For the case of normal error derive the distribution of the error 
characteristic t = t(£) given the orbit; see Section 7.3. This is an error distribution 
that corresponds to right cosets. 

15 (< Continuation ). For the case of normal error derive the marginal distribution of the 
inner-product matrix for residuals 

S(E ) = T{E)T\E) 

(Compare with Section 12.3 in Chapter Three). This is not a right coset distribution. 

*16. Regression-linear model. Consider an error variable E 



with error distribution 
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And consider a response matrix 7 : 



The regression-linear model is 

f(E) dE, f(E)dE 
or 

Y=6E, Y = 3V + YE 


• j. mv, ivyu XV.1UU3 U1 JUUULUUU. 


(ii) Consider the regression-positive linear group: 


_ j _ [l 0^1 B is a. p x r matrix ) 

l L B C J Cisap x p matrix with |C| > Oj 

Check that G is a group. Describe the orbits on R pn by using L + notation from Section 2 
in Chapter 3 and Section 4 in this chapter; show that G is unitary on R pn if n > p + r and 
a certain degenerate set of points is deleted. 

*17 ( Continuation ). Define a variable [7]; 

rv1 f 1 0 1 f 7 0 VJ 0 1 

l^CF) C(Y) J C7 i = (_5(F) T(F) J (_0 0(7) J ’ 

and a point D{Y) in R vn : 


- f v l 

i; l^(F)j 


Note: Intermediate D*(Y) and 7° can be defined here from Problem 33 in Chapter Three 
in the same way as corresponding matrices in Section 4 were defined from Section 10 in 
Chapter Three. Show that [7] is a transformation variable and D(Y) is a reference point. 
Check the alternative notation: [ B(Y ), C(7)] and 7 = B(Y)V + C(Y) D(Y). 
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*18 ( Continuation ). Verify the following invariant differentials. 

dY dY 

dm(Y) - | [y] | B |C(Y)|" ’ 

dg dB dC 
dn(g) = — |c|P+r ’ 

dg dB dC 

dv(g) = iTP ~ ici* ’ 


*19 ( Continuation ). Derive the following distributions: 

r/[£] 

*([£]: D) d[E] = /c(D)/([£]D) |[£]| n j|^p+T- 


*(*»II/M • +c 


IfYll™ 1 

f*(8: Y)M = pp *(«> 


«o)riH r_1 


\C\n-V-r dB dC, 


|C(Y)| n ~ r c/33 rtf’ 

"“FF" l r l' 


*20 (Con,i«»a,ionY (i) Derive the right-coee. location distribution 

error variable H = C- x 5; derive the structural distnbutionfor ^(33. Y)c/33 for 33. 

(ii) Derive the right-coset scale distribution^C: J9) c/C for the errorvana e 

derive the structural distribution g*{T : Y)dT for T (cf. Problem 36, Chapter Three). 

*21 0 Continuation ). Suppose the error distribution is rotationally symmetric: 

/(/,-*£)=/(£) 

for all rotations in the group 

«.-KT »::i 

A complementing subgroup is the regression-progression group in Problem 32 in Chapter 
Three, 

i r I 0~) Bis & p X r matrix 1 

G t — Ik = I ^ j\ Tis a.p X p positive-lower-triangular) 
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The invariant differentials for G 0 are given in Section 6; the invariant differentials for G T 
are given in Problem 34, Chapter Three. Let 

ci o') r/ o") 
e = [am = . 

y q (^33 Gj (j) 

(i) Derive the marginal structural distribution of (33, G). 

(ii) Express the preceding structural distribution in terms of the equivalent quantity 
(33, E), where 

s = rr' = gts'. 

*22 ( Continuation ). Consider the case of standard normal component error. Let 

[£] = [£][£] = r 1 °if / 0 1 . 

T O \JKE) T(E) J [o O(E) J 
where C (£) = T(E) O(E). Note that EE' = (BV + CD)(BV + CD)' 

(i) Derive the distribution of [£], given orbit in terms of B(E), T(E), 0(E). Record the 
marginal distribution of B(E), T(E), given orbit ( not a right coset distribution). 

(ii) Derive the distribution of the error inner-product matrix for residuals 

S(E) = C(E) C'(E) = T(E) T\E) 

( not a right coset distribution). Compare with the distribution in Section 12.3 in Chapter 
Three and in Problem 15 in this chapter. 

(iii) Derive the structural distribution for 33, IS, Q. 

(iv) Determine the marginal structural distribution of 33 and the marginal structural 
distribution of G. 

(v) Determine the marginal structural distribution of E = IT' = GG'. __ 

*23 ( Continuation : case of normal error). Derive the distribution of the error characteristic 
H = H(E) = C- l (E)B(E), 

an analog of the r-variable in Section 7.3 and Problem 14. This is a right-coset distribution 
of the “general location” subgroup, the regression subgroup; it is the distribution appro- 
priate to tests of significance concerning location and to general inference concerning the 
quantity 33. 
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CHAPTER SIX 


Local Analysis 



F 



The structural models in preceding chapters are designed to describe systems 
in which the primary quantity 0 is a transformation in a group, a transforma- 
tion that carries an error value from within the system into a response value 
on the chosen measurement scales. A change in the quantity 0 is a change in 
the transformation, and a change in the transformation produces a change in 
the response value. 

In some systems a weaker condition exists. A change in the quantity 0 
is a change in a transformation carrying internal error into a response value, 
and a change in the transformation produces a change in the response value. 
The pattern of change in the response values, however, may be different near 
one 0 value than near another 0 value; in other words, the transformations 
may not produce a group of transformations on the possible response values. 

In this chapter the weaker condition is examined for a real variable and a 
real quantity. An increase in the quantity 6 is assumed to cause an increase 
in the response variable x. But the pattern of increase may be different near 
different 6 values. 

1 THE STOCHASTICALLY MONOTONE MODEL 

Consider a real-valued quantity 6 and a real-valued response x. Suppose 
that an increase in 6 produces an increase in x, and let the classical model for 
x be given by the distribution function F(x:0). 

Suppose /(x;0) = 9F(x:0)/9x is continuously differentiable with respect 

to x and 0. The total differential for F at 0 O is 


dF 


dx 


F(x:6 0 ) dx -j- 


90 


F(x:0) 


J9=0O 


With the probability level Fheld constant 
to the differential dd at 0 O is 


dd = F s (x:0 o ) dx 4- F a (x : 0 O ) 90. 
the differential dx that corresponds 


dx = — 


F 8 Q:0 q) dd 
F a (x:0 o ) 


: ! 

1 

i 


i 

i 



Figure 1 A change dd at 6 0 ; the corresponding change in the response variable at various 
points x (at various values for F). 


Suppose that this differential dx is not equal to dd at some or all values of F. 
Then letj /(x, 0 O ) be a linearized variable, an increasing function of x that 
has differential dl uniformly equal to dd at 0 O : 


dl — — 


F a (x : 0 O ) 

F fl (x : 0 O ) 


dx = dd 



mm* 


(See Figure 1.) The linearized variable is determined except for an additive 
constant corresponding to the absent lower limit of integration. 

Let H(l:6) be the distribution function for the new variable /(x, 0 O ) : 


H(l:6) = F(x(I, 0„):0), 

where x(/, 0 O ) is the inverse function obtained by solving 


/(x, 0 O ) = / 

for x. 

For the new variable / the differential dl that corresponds to the differential 
dd at 0 O is 


dl = _ium d6 = 

Htii-.e 0 ) 


F fl (x:0 o ) 


L(z:0„) 


dx(l, 0 O ) 
dl 


dd = dd. 


This checks that the new variable l(x, 0 O ) has the required property. Note that 
A a (/:0 O ) = — FTj(/:0 o ) for all /. 


| The linearized variable l(x, 0) here should not be confused with the log-likelihood 
function !(x:6) in Chapter Four. 
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The model for the new variable can be expressed as 

H(l - (0 - 0 o ):d o ) 

to a first derivative approximation for 0 near d 0 : 

H(l - (0 O - 0o)--0o) = 

'1 H (i _ (0 _ 0 O ) : 0 O )1 = -Hj(Z:0 o ) = H a (Z: 0 o )- 

The model, to a first derivative approximation at 0 O , can then be viewed as the 
classical model based on the following simple measurement model 

H(e:d 0 ), 

l = [(0 - 0 O ), 1>- 

The assumptions at the beginning of the chapter present a change in 0 as a 
change in a transformation applied to internal error. The preceding simple 
measurement model is then the appropriate model for inference near 0 O . 

Now consider a sequence (aq, of response values. The model for 

the response sequence is 

fl F^: 6); 

1 

or, in terms of the transformed response sequence, (/ t , ...,/„) = (K x u ®)> 

. . . , /( x n , 0)), the model is 

IT Hik-.d). 

1 

The related simple measurement model uses a position variable r(l) and an 
orbital variable 

d(l) = (/* - r{ 1), r( 1))': 

A simple choice is r(l) = l x and 

d(l) = (0, / 2 - l x , ...,/„ - h). 

The marginal probability element for d is 


A CO 71 

n h*(* 

J — CO 1 


di 1 0) dt * * * * dd n . 


J — CO i. 

A first derivative change in 0 at 0 O should shift the distribution of 1 along 
the orbits; accordingly the marginal distribution of d should have a derivative 


equal to zero at 0 O : 
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r £[«.+*:»)* - r nw + 

J-co i. a=o 0 J — co i i Hit + «i‘. 0 q) 


n m + + d ‘ :6 j> it 

-00 1 1 + M z :0o) 

-00 Ot 1 


Hit + d, 


00 

:0 O ) = 0. 

J” 00 


The reduction uses H u (l:6 0 ) = —H gi (I:d 0 ), a consequence of JT z (/:0 o ) 
-il a (/:0 o ). 

The conditional distribution for the location variable l x is 


II Hih + d-.d) 

g(V.d, 0) dl x = — ^ dl x . 

n Hit + di : 0) dt 

J — co 1 


& 



The conditional distribution should have the same linearized property at 
0 O as has the distribution of a single /; this is easily checked by showing that 

^£(I.:M)1 - e„) 

_oa J9=0 O dl x 

(see Problem 1). 

The conditional model can be expressed as 

g{k ~ (0 “ 0o):d, 0 O ) 

to a first derivative approximation for 0 near 0 O . The model can then be 
viewed as the classical model derived from the following reduced simple 
measurement model 

&0i:d(l), 0 O ) 

h= (0 - 0 O ) + e x . 

The assumptions at the beginning of the chapter then imply that the reduced 
model is the appropriate model for inference near 0 O . 
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The structural probability element for 0 at 0 O is 


1 


*(^•-<•(1). «.) <*« = 1 m 

HH,(t + d,-.e 0 )dt 

% —00 1 


nncvA) 

i ______ 


f°° f[^(^ + 4 6o):flo)dt 

J-co 1 


This element of probability at 0„ uses the linearized variable /(*, 0.) for 

that 0 value. u . - , 

Now consider the marginal likelihood function based on the orbital 

variable. The marginal element for d is 


•o 0 71 

n 

J — 00 1 


Hft + d-i'.Qo) dt • dd 2 ' ■ ‘ dd n 
n 

= f 00 (-ir n ^(4* + d » 0 o):e o) dt — 

J— 00 1 1 


(’” n F s «t + So) - *(® 1 . »•). 0 o):0o) dt J^ dx 

= “ TT ^i^o) dk 

Ll F x (x i :6 0 ) 

The differential df for the position variable on the orbit can be related to 
differential length ds along the inverse image of the orbit— on the space ot 

thic * 

„ A , (dXiik + di,d 0 yf 2 

= ?«=?(■ — If, — 

_ 

rXF^-.dJ 

The marginal element for the orbital variable can now be expressed m terms 
of Euclidean volume, on the space of the afs, cross-sectional to the inverse 
image of the orbit. This gives the marginal likelihood function for 0. 

r 2 

I" n fM‘ + 'te'S) - K^eioy.e) d, 

11 


The marginal likelihood function gives secondary information concerning 6, 
information derived from the orbit at the observed response vector. 
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2 THE LINEARIZED POISSON 

The Poisson model is used to describe the frequency of an event that can 
occur randomly in an interval of space or time. The quantity Q is the mean 
frequency for the interval of space or time. An increase in 6 corresponds to 
a compression of the process and a consequent stochastic increase in 
the frequency. These are the necessary ingredients for the application of the 
methods in the preceding section, except for the discreteness of the 
distribution. 

If the quantity 6 is large, however, the distribution for the Poisson variable 
spreads over a broad range of integers and is closely approximated by a 
continuous distribution. This section applies the methods in the preceding 
section to the approximating continuous distribution. 

The Poisson distribution function 

x nx 

2 — exp {—0} 
o a;! 

can be viewed as giving the cumulative probability to the point x + | for 
the approximating continuous distribution F(x : 0) : 

F(x + i:0) = I“exp {-0}. 
o a;! 

An alternative expression for F can be obtained by differentiating with 
respect to 0 and then integrating back: 

a X ax X ax- 1 

i iF{* + 1:0) = X - L . exp {-0} + exp {-0} 

00 o a;! i (x — 1)! 

QX 

-exp {-0} 

xl 

r * oo nx F oo nx 

F(* + 4:0)= — exp { — 0} d0 = — — — ; exp {-0} d0. 

Je xl Je I (x + 1) 

In this alternative form the distribution function extends smoothly and 
continuously for all values of x greater than x = — 
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Figure 2. The ordinary Poisson probability function is given by the vertical bars. The 
continuous probability /unction is designated/^ : 6). The continuous Po.sson deur j 
not pass precisely through the tips of the bars; rather a bar gives the average height of 
fix : 6) on the unit interval centered at the base point of the bar. 

This distribution on the range (-i, co) will be called the continuous Poisson 
distribution. The probability for the continuous Poisson between x 2 and 
x + \ for * an integer is equal to the probability at x for the ordinary 

Poisson distribution (see Figure 2 and Problem 5). 

Now consider the linearized variable l(x, 0„) for the continuous Poisson 
distribution. The derivative with respect to 6 is available from the preceding 
analysis ; 

q x ~~ 1 A 

The derivative with respect to * seems unavailable explicitly. A series expan- 
sion can be calculated with considerable difficulty; a first-order approxima- 
tion relative to (1/0) is, however, available almost trivially: the density 
function/(x: 0) = F x {x:6) is closely approximated by the Poisson probability 
bars (see Figure 2). This gives the approximation x 

f ^ ;6) “ix7TT) exp( “ 9} ' 

The integrand for the linearizing transformation is 

— — 9 7T exp { — 0 O } 

_ _ r ( x + 11 

F e (x: 0 „) er >A 


T(x + *) 
nU + j) 

° V(X + 1 ) 


— exp { — 0 O ) 
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Stirling’s formula, 

r ( * +0-^5 «p- {-* + ~ ^ + ■ ■ ■} 

can be used to simplify the ratio of gamma functions: 

_ F x {x : 0 O ) ^ q x A (x - j) 35 exp {-x + \) 
F g (x:6 0 ) ° x x+A exp { — x} 

= (zfo -iz-’r exp {« 


The approximation applies for x and 0 O large. The linearized variable is 
l(x, 0 O ) dt = 26? x A . 

The form of the linearized variable suggests linearization with respect to 
the modified quantity V 0 : 

_ FJx:0 o ) ^ /9.W 1_ = 

Fvg(x:6 0 ) W [d0/h N /0] 8=8o 2?x 

The linearized variable relative to the quantity ?6 at \/0o is 

K x > V®o) =J = 

This linearized variable does not depend on 0 O ; it is a linearized variable 
that applies generally, provided, of course, that the quantity 0 is large. It 
follows then that the distribution of the error variable 


e = V x — a / 0 


does not depend on 0, provided the quantity 0 is large. The transformation 
V x can be referred to as a distribution-stabilizing transformation. 

In Summary: The Poisson model in a frequency application can be 
approximated for large values of 0 by the simple measurement model 


e, 

V x = V0 + e. 

The model has an error variable e with an approximate distribution (the 
limiting form is examined in the next section) ; and the model has a measure- 
ment V x, and a location quantity ?6. 
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3 DISTRIBUTION OF THE LINEARIZED POISSON 

For a Poisson variable * consider the distribution of the error variable 

e — 7 % — 'Jo 

for large 0 This can be examined most easily by treating £ as a continuous 
Poisson variable and using the ordinary Poisson probability funct.on as the 
approximating density function: 

f(x:0) = — exp {-0} dx 
x\ 

— exp {— 6}2yj x dj x 

r(x + i) 


;®"“ : p {*- e -4 + "'W 5 ' 


Let g(e : 0) de be the probability element for the error variable e. Then 


In g(e : 0) = 


-f • • A — 2a;(ln ^jx — In ^Jd) 


+ + li( 1+ 7) + '") 

2(e + V0) 2 ln (l + 4j) 


In /— + e 2 + 2 ejd 


76 2.1 6 


2 3 

2(e 2 + 2 e^d + 0) (“7= ~ h + 


70 20 ' 3 Q % - 4 0 2 


lnJl + e‘ + 2e s T0-~ + 

__ 2 — a-— — --* 

7 « 9 


4 g 2 t 2 — = 

+ 70 3 0 

- 2 e 3 

W0 + «*-^- 4 


1£_ 

" 3 70 2 0 

2 _e^ 1 - 2e 4 

3 7^” 120 


exp {— 2e 2 } exp 


2 il _ 1 ~ 2e * 

3 70 120 


± • • T • 


g(e:0) de = 
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The limiting form of the distribution of the error variable e is normal with 
mean 0 and standard deviation The final factor can be expressed in terms 
of power series in (ej\Jd) ; it describes the departure from the limiting form 
of the stabilized distribution. 

The series involved in the preceding analysis are power series that converge 
for \e\ < \/d. It follows that the log-density converges uniformly in any 
finite range for e as 0 -> 00 . The Poisson and normal densities decrease 
monotonely about their maximum points; hence the distribution of the 
error variable of the stabilized Poisson approaches the normal density 
uniformly as 0 -> 00 . 

The Poisson distribution is usually approximated by using the variable 


as a standard normal variable. The limiting form of the distribution of z can 
now be examined. The Poisson probability function treated as a density func- 
tion (Section 2) is 

7\ exp {_9} dx = ^ exp {* - 9 - ih + ' ’ '} 

Let h(z:6 ) be the corresponding density function for z; then 

In h(z:6) — In /— + (x — 0 — !-•••') — (x -f- £)(ln x — Jn 0) 

V 2it \ 12x / 


+ z 70 


— (0 + Zyjd + 2) 


+ Z 70 


. 2 

7 


_l_ . - 

26 3 6' 


( 0 +z 70 +i)ln ^1 




3 - 3z 3z 2 

67 0 + 
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^• enCe 2 , („3 _ 3z 3s 2 — 1 — z 4 \ 

h(z:6) = ~b exp _ 2 exp / nT + ^ ' 

The variable /has a , tailing norma, form with mean 0 and standard 
deviation 1. 

The variable e has limiting error form, 

e = Jx-Jd, ' 

as 0 oo ; the variable 2 , 

* = A - /e, 

not, t . t ■ !.%()"’ . *0 

tsx. fr r .... — — 

the second variable z by introducing a factor 2 

e* = 2e = 2/ x — 2v 6. 

Now both variabies, e* and a, have limiting standard norma, distr.butions ; 
their densities are _ e *4 /g 1 

, ^ex„f-^""l- — - L -^ iS ±''h 


\ l 
-:0) =-^ex Pl 
,2 / V^-rr 

h(z:0) = y= ex P 


e * 3 \ f e* 3 _ 1 — 

Tr xp \i2V0 120 


3z 2 + z 4 
120 


1 [ z 2 l f -2z 3 + 6z _ 1 - 3z*_± _£ . . \ 

hm) = ~m exp r 2 1 exp r "i/vr - 120 > 

, ration term for e* is considerably smaller than that for z 

re/ra™r g r— « 

-T t thl/ht“o.h T the lhiearization with respect to the quantity and 
"of app-ch to limiting norntahhy favor the error vanable 

with an approximating normll d.stribution with mean 0 and standard 

deviation i illustrating the approximating measurement model 

Consider an example h 8 r > the approximating model 

With observation x = 121 Irom a roissm p 

/— exp {— 2e“} de, 

V 7T 

11 = /0 + g- 

A 95% probability interval for . is (-0.98, +0.98); S "' 

probability interval for # * U ± “ d fOT 6 “ » 
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PROBLEMS 


1. Show that the conditional distribution of l x given the orbit d(l) has the property of 
uniform shift under first derivative (0) change at 0 O ; that is, show that 




(For details see Section 1.) 

2. Let F(x:6) be the Pareto distribution 

x > 1, . 

F(x:6) = 1 — aj-9, 

0 > 0 . 

Derive the linearized variable /(*:0 0 ). Can the local linearization (at 0 O ) be extended to a 
global linearization (all 0)? Extend if possible. (D. R. Brillinger, 1963.) 

3. Let F (x : 0) be the Weibull distribution with density 



x > 0 
0 > 0 . 


Derive the linearized variable l(x:B 0 ). Can the local linearization be extended to a global 
linearization? Extend if possible. (J. Whitney.) 
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4. Let F(x:6) be the chi-model with density 


Six 


1 


rl - 1 


f(x:Q) d i 2 f i‘ l ~ x 6X ^ [ 20 2 j 


Derive the linearized variable Can the local linearization be emended to global 

linearization? Extend if possible. 

5. Let ( x > be the nearest integer to x. 

(I) It * has the continnous Poisson distribution F(» : 9) with quantity 0, check that (a) 
has the ordinary Poisson distribution with quantity d. (Y. b. Lee.) 

(ii) Check that r 

F(* + 1:0) ~ ~ R 6 ) = 
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Inference from Frequencies 


The preceding chapters indicate the broad range of applications for the 
structural model. They also indicate some areas and directions that cannot be 
covered, or covered completely, by the structural model. For example, the 
additional quantity in Chapter Four cannot be described directly by the 
quantity of a structural model; the stochastically increasing model and 
the Poisson model in Chapter Six are not exact structural models. 

Without any structuring relationship between the quantity and the 
response, there remains, in the discrete case, only the probability function to 
identify a response value and, in the continuous case, only the likelihood 
function to identify a response value (Section 1 , Chapter Four). 

The case of a discrete response variable is examined in this chapter. With 
multiple observations the composite response can be recorded alternatively 
by giving the frequency for each of the possible probability functions. For a 
large number of observations these frequencies acquire a position relation- 
ship to their probabilities in the same manner as with the Poisson model in 
Chapter Six. This provides the basis for large-sample structural inference. 

The continuous response variable, with inference based on likelihood, is 
examined in Chapter Eight. 

1 FREQUENCY MODELS AND THE POISSON BASIS 

The Poisson model is perhaps the simplest statistical model with a 
frequency variable : 

= —exp { — d}, x = 0,1,2, 

x\ 

A slightly more complex model is the binomial model, which describes the 
frequencies x 1 , x% of the occurrence, nonoccurrence of an event in n per- 
formances of a process : 

/Oi> x z'-Pi,Pi) = ( n 

\*i 

267 


P?Pl\ 


x j > 0, 

+ x 2 = n, 
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where p x + p 2 = 1 (pj >: 0) an d 
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rrequency moaeis ana the Foisson Basis 


l n \ = 

\x x x 2 ) x x \ x 2 \ \ x iJ \ x y 


is the combinatorial function. 

A generalization is the multinomial model, which describes the frequencies 
x x ,...,x r of occurrence of events E x , . . . , E r in n performances of a process, 
the events E x , . . . , E T for a performance being mutually exclusive and 
exhaustive : 


/(x:p)= U- n - J 

where 2 Pi — 1 ( Pi ^ 0) an( ^ 


PT---Pr 


Xj >_ 0, 

2 x i = n > 


( - \ = — 

\x x ---x r J x x \ • • • x r l 


is the generalized combinatorial function. The quantity p may be restricted 
and may depend on an essential, but simpler, quantity 6. p p($)- 

Several multinomial models can be combined to form a single composite 
multinomial model: 

/(x, p*> = n ( Xi( , n ‘. J p ‘>' ■ ■ ■ i,. Z = !;. . 

where 2iPu = 1 ( Pa > °) for each L A g ain the T uantities Pi’ • • • ’ P* ma ? 
depend on a simpler quantity 6. 

Now consider a finite population of N elements with m x elements of kind 
E x ,...,m r elements of kind E&m^N). The hypergeometric model 
describes a succession of k random samples that exhaust the popu ation , 
on the fth sample let x u , .... x H (= x,) designate the frequencies of events 
E r ; the fth sample size is n, and 2 »i = N: 

^ , , , x Xi > 0, 

f(x x) . n>< ! n m ti y =n 

j(x x ,...,x k ) N! XT x u- Zi 3i 


2* X ii — 

n ( ni a ) uL m *. 

i \«U • • x ri) = i \ X n 

( N ) 

\m x ■ ■ ■ m r J 


Lr ik l 


N 

[n x - ■ ■ n k/ 

The m x ,..., m T may be quantities and may depend on an essential but 

simpler quantity 6. ’ 

The Poisson model provides a basis for analyzing the other models. 


Consider r independent Poisson variables x x , . . . , x T with means cpp x , 
PPt (2/h — !)• The composite model for the x’s is 


II (fPiY 

it**! 


exp {-p}; 


n is 


Pi 1 ■‘■Pi 


the conditional model given that 2 x , = 
p n exp { — p}jn\ 

The conditional model given 2 x i — n is the multinomial model in a 
preceding paragraph (see Figure 1). The multinomial model is obtained 
regardless of the value of p. The choice, p = n, is a simple and convenient 
choice: the vector of Poisson variables then has mean (np x , . . . , np r ); the 
linear constraint 2 x i — n passes through the vector mean; and the vector 
mean of the Poisson variables is also the vector mean of the multinomial 
variable. 

The composite multinomial is a combination of independent multi- 
nomials. The ith component multinomial can be obtained from Poisson 
variables with means . . . , n i p r . i by imposing the condition 2* x h = 
The composite multinomial can then be obtained from k batches of Poisson 
variables (i — 1 , . . . , k) by imposing the indicated constraint on each batch. 

Now consider the k batches of Poisson variables but with r x = • • ■ — r k = r 



Figure 1 The possible values for two Poisson variables (x x , a: 2 ). The possible values for a 
binomial variable (x v x 2 ) with n — 3. The Poisson variables conditional on x x + x 2 = 3 
give a binomial variable. 
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and with p, = ■■ = P„ = P The composite multinomial obtained by the 
conditions in the preceding paragraph can be further conditioned by 
% l x y = m 1 ,...,Z l x„ = m r . The resulting conditional model is the 
hypergeometric model described in a preceding paragraph The hyper- 
geometric does not depend on the vector p; the vector p tn the Poissons and 
multinomials can then be chosen so that the Poisson vector mean ,s also the 

mean of the hypergeometric (2 n i p l - m x , • • • > E >Up T ™r)- 

The hypergeometric model can also be obtained from independent Poisson 
variables and an intermediate single multmomial. Let x H ( J , • • ■ r, 

1 _ i k) be independent Poisson variables with means Np#, where 

. Jm.n.lN 2 A single multinomial is obtained from the condition 2 x H - N. 

The hypergeometric is obtained from the stronger conditions 2 x n - n x , 

2 %jic == « & ; 2 x xi = m x , . . . , 2 x ri = m r . 

2 FREQUENCY MODELS: LARGE SAMPLES 

The common frequency models can be obtained by conditioning independ- 
ent Poisson variables. In Chapter Six the analysis of the stochastically 
increasing model showed that the Poisson model, in a frequency applica ion 
was approximated by a simple measurement mo e , piovi e 
location quantity was large. This section describes how the Poisson result 
extends in a simple manner to cover the common frequency models, provided 
again that the location quantities are large. A , . , 

Consider t Poisson variables x„ .... with means Np^ ■ ■ • . Pt, 
consider j linearly independent constraints with integer coefficients. 


where the constants c,(N) are such that the mean vector (Np, _ • ■ . Wp,) 
satisfies the constraints. The conditional distribution of the x s a frequency 
model which for appropriate choice of constraints canbe any of the mod 
described in Section 1. This section shows that the conditional dlstr ‘ d "*‘ 
the x’s, in a frequency application, can be approximated by a measurement 

model: 

The error variables e x = V*i — 'J Np x , . . • , ?t — . 

distribution that approaches (as A-> °o) the distribution of a sample t of 
from the normal with mean 0 and standard deviation £ but conditioned y 
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constraints as expressed in terms of the e’s; the constraints approach linear form 
in any bounded range (as TV —*■ oo). 

A realized sequence e 1 , . . . , e t from the error distribution provides the link 
between the measurements y/x u . . . ,y/x t and the location quantities s/ Np lf 

...,v Npt : _ 

V*i = V Np l + e x 

jx t = V Np t -f e t . 

Let e x , . . . , e t be the error variable for the Poisson variables x x , . . . , x t 
(Section 2, Chapter Six): 

e x — V*i — *jNp lt . . . , e t = y/x t — V Np t ; 

and suppose p x > 0, . . . ,p t > 0. By Section 3 in Chapter Six, the limiting 
distribution of the e’s (as TV — »- oo) is that of a sample of t from the normal 
with mean 0 and standard deviation |. As part of the derivation it was shown 
that the probability function for an error e (with a scale factor accommodat- 
ing the average spacing between e values) approaches uniformly the density 
function for the normal with mean 0 and standard deviation £. From this it 
follows that the probability function for the error vector e = (e x , . . . , e t )' 

\ (with a scale factor to accommodate the average spacing between e values) 

| approaches uniformly the density for a sample of t from the normal with 
mean 0 and standard deviation It follows that, if attention is restricted 
| to points e that satisfy the constraints, then the probability function for these 
| points (with the same scale factor) approaches uniformly the density for the 
j sample of t from the normal at these points. It then follows that the condi- 

! tional distribution of the e’s is as described by the measurement model — 

i provided the following are established : 

The constraints in terms of the e’s become linear -in any bounded range about 
(0, . . , 0) (as TV -> oo) (see Figure 2). 

The spacing of the e points that satisfy the constraints becomes uniform in 
any bounded range, and the spacing between adjacent points goes to zero (as 
TV— > co). 

The next paragraph proves the first of these statements: The proof is simple 
and the results are needed to summarize this section. The next section is 
devoted to proving the second statement: The proof is somewhat long and is 
not of general statistical interest. 
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l 

v -7 


Figure 2 The binomial variable (xj, x 2 ) with probabilities p v p 2 , and n 5, 10, 15, 20. -g; : 
The approximating error variable (e v e 2 ) with means 0, standard deviations and con- ^ 

straint V p 1 e 1 + V p 2 e 2 = 0. 


Consider the form of the constraints, 


I k^i = c x {N) 


2 hi x i = c .(^0» 
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in the neighborhood of the point (Np x , . . . ,Np t ) which satisfies the constraints. 
The error variables, 

e i — ! — V Np x , . . . , e t = sjx t — yjNp t , 

can be used to reexpress the constraints : 

2 *u(V. Vpi + e,) 2 = Ci(iV) 

2 4-(VA^ + e,) 2 = c s (N). 

These can be simplified (the point {Np x , ... , Np t ) satisfies the constraints): 
2 e,- = -2 lueyi^N 

2 e i — -I'l«$l2y/N. 

For any finite range for e these approach the linear constraints __ - 

2 W/b e, = 0 


2 l *cJpi *i = 0 

as iV-> co (see Figure 2). 

Consider briefly the quantity p, and suppose that it depends in a contin- 
uously differentiable way on a simpler quantity 0 = (0 l3 . . . , 0 r )\ Let 0° 
be a reference value for 0 and consider the model for 0 near 0°. The square 
root ofp.(0) can be expanded by Taylor’s theorem: 


V Pi( e) = V ft (9") + i ( 6„- 9 . u)i ^=^ + .... 


ft ; Hence 


(ViV A (0), . . . , ViVp,(0)) 

= , JnpJw)) + (0 X - 0?X + • • • + (0 r - ey r + 
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where! 


aVivp i(e°) 

de 1 


djN Pt {Q°) 

dd x 


d^Np 1 (6°) 
30, 


3VNp f (e°) V 
^ / 


If the location quantity 

(7^(6), • . - , V Np t (&)) 

is within two or three standard deviations of the reference value 

(7^(0°), • • • , 7jV/l( 6 °))’ 

then the corresponding quantity 0 differs from 0° by an amount of order 
N -H It follows that a first derivative approximation is appropriate as N 
becomes large. It follows also that the deviation of the location quantity 


from the reference value 


(7^(0), ■ • • > 7 Np t (9j) 

(7a^i(0°), • • • , 7jvp*(0°)) 


is approximately linear in terms of structural vectors with 

coefficients (0 X — 0?), • • • > (®r ~ T . x be 

The results in this section can now be summarized. Let aq, 
frequency variables that satisfy the linearly independent constraints 

2 hi x i = c i( iV ) 


I i*x t = c.(N) 

with integer coefficients. And suppose the model for the ^ re ^ uenC J 

is that of independent Poisson variables x x , . . . , x t that satisfy e 

and have means N Pl (ty , . . • , Np t (9) that satisfy the constraints ( Pi >0 and 
p continuously differentiable near 0°). Then, for Alarge, the frequency model 

f Notation for the derivative at a particular point: df(6 Q )ldd = [df(,0)jd0] g=9 o. 
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with 0 near 0° can be approximated by the following measurement model , the 
simple regression model (Problem 17, Chapter Three): 

• • • > e t , 

7*i-7. N Pl (Q°) = 2 (0 U - 0> ul + e x 


Vx, - ■jNp.m = i(8„ - e‘Jv„ + e,. 

u= 1 

The model has an error vector (e x , . . . , e t ) which has the distribution of a 
sample from the normal with mean 0 and standard deviation l but conditioned 
to satisfy the constraints 

I'ufpM ■ e, = 0 


; I Up,m ■ e, = 0, 

and the model has a structural equation in which a realized error vector provides 
f/je link between the measurement deviation from the reference location and the 
| quantity deviations 0 X - 0°, . . . , 0 r - Q° (as coefficients of the structural 
! vectors ... ,\ r ). 

This form of measurement model can be analyzed in a straightforward 
j manner by the methods of Chapter Three as applied in Problems 17, 18 in 
that chapter. Examples are given in Sections 4, 5, 6. 

The measurement model is an approximate model, a limiting model as 
N-* co. In an application, with a given N, the curvature of the constraints 
and the curvature of the mean vector p(0) as a function of 0 may not be 
negligible. Analysis in the neighborhood of a reference value 0° may indicate 
the quantity 0 to be near a value 0 (1) beyond the range of reasonable linearity. 
The value 0 (1) can then be used as a new reference value. Analysis in the 
neighborhood of 0 (1) may then indicate the quantity 0 to be near 0 (2> . The 
procedure can be repeated, forming steps of an iteration. After several steps, 
in the typical application, the approximating model will describe probabilities 
; , for 0 in the linear neighborhood of the final reference point. 

*3 THE UNIFORMITY PROOF 

This section completes the proof in the preceding section by establishing 
the uniformity of points that satisfy certain linear constraints. The proof is 
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not of central statistical interest and may be omitted without affecting the : 
background for succeeding sections. _ 

The range of the Poisson variable x = (*i, • - • , x tY is the set of P 0Sltlve 
integer or lattice points in R ( : 

S+ = {x: x t > 0, x t = integer, i = 1, ... ,'*}• 

For the earlier parts of the proof it is more convenient to work with the set 
of all lattice points 

S = (x: x t = integer, i — 1, . . . , *}■ 

The points of the lattice can be represented in terms of an Abelian group of 
integer translations. Let 

g x x = (*i + 1> x z> ■ • • > x t) 


g jX = (*i, . . . , 1> x t + i)'; 

then 

gvi ■ ■ ■ gfx = (ccj. + Vi, • • • > x t + 2/«) 

(each y is an integer). It follows that the set 

G = {g^ ■ - • g«: = integer (i* = 1 ..... 0} 

is an Abelian group. The group can be used to provide coordinates on S : 
Let 0 = (0, . . . , 0)' be the reference point; let 

M = g? ■ • • gf\ 

then the point x has position [x] relative to the reference point 0, 

x = [x]0. 

The lattice points of 5 are uniformly spaced in R n . The transformations 
of G express this uniformity: A transformation g? ■ • 'gf maps 5 onto itself 
by a translation of y x units in the direction of the first axis, . . . ,Vt units in t e 
direction of the rth axis. The group demonstrates that the set S has the same 
form at all points : The set S is homogeneous relative to the translation group , 
The linearly independent constraints 

2 l u x i = c iW 


2 l si X i = c s( N ) 

can be examined for various values of the c’s; and they can then be viewed as 







t: = hl x l + ’ * • + I st X t 

i 

i 

f 

W t — U\ X l + • ’ • + ltt X t> 

the remaining t s coordinates can be based on a completing set of t — s 
linearly independent constraints with integer coefficients. A lattice point x 
(an x vector in S) can be represented in alternative coordinates as a w vector 

i. (the corresponding w vector has integer coefficients). For an arbitrary x 
vector in R l there is a corresponding w vector. An arbitrary x vector has a 
neighboring lattice point; and correspondingly an arbitrary w vector has a 
neighboring lattice point (see Figure 3). 

Now consider the lattice points that satisfy the 5 constraints with right sides 
set equal to zero : 

• I 




Figure 3 New coordinates w 1 = x 1 + cc 2 , w 2 = x 2 — x v A point in old coordinates 
(1, 2), the point in new coordinates (3, 1). The set S 0 based on the condition + z 2 = 0: 
S c is the orbit of a subgroup G 0 = {gfgp : y = integer}. 
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In terms of the group coordinates this is the set 


g c = jg? 1 - • • g?- 2 hiVi = °> j = h---,sy 

But the set G c is closed under the formation of products and inverses; it is a 
subgroup. A subgroup in an Abelian group is a normal subgroup 

The subgroup G c partitions the set S into orbits G c x (x in S). The orbit 

through the origin 0 is 

G c 0 = jx: 2,lji x i = °> xeS, j — 1, 
the orbit through a point x° is 



xe S, j = 1, • • • , s 


Any translation in G c carries an orbit into itself: The orbit is homogeneous 
under the group G 0 \ the orbit has the same form at all points. The group G c 
generates points spread uniformly in the (t - ^-dimensional subspace: 


eR\ j = 1, . . . , sj; 

and by translation the group G c generates points spread uniformly in the 
( t — s)-dimensional subset: 


x: 2 hi x i = x 


{x: |',-A = 2 J = 1 5 } 

passing through a point x° of 5. 

Now consider how the uniformity of lattice points x in a subset satisfying 
the constraints carries over to uniformity of the corresponding e points in a 
bounded neighbourhood of 0. The volume change from a point e to an 
original point x is 

n 2 (Viv7i + e<) = 2 { N tl2 n (V Pi + fii/V. N) 

i=l 1 

where h N (e) approaches JJ-Jpi uniformly in any bounded range as N 
Thus the uniformity of the constrained lattice points in a region surrounding 
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! • • • ’ N Pt) becomes uniformity for the points e about 0; and the spacing 

between e points goes to zero as N -> oo. 

This establishes the uniformity property required in the preceding section. 
4 THE MULTINOMIAL MODEL 

Consider an example involving a multinomial model. The theory in 
Section 2 relates the multinomial model to a measurement model with normal 
errors and with structural vectors; the methods of Chapter Three can then 
; be applied. 

Consider two factors affecting the breeding of maize: a first factor. 
Starchy S or sugary s; a second factor, Green G or white g. The data record 
the classification of n = 3839 progeny of self-fertilized heterozygotes: 

G g Total 
S 1997 906 2903 
^ 904 32 936 

Total 2901 938 3839. 

The accepted theory for the example prescribes marginal probabilities in 
the ratio 3 : 1 for S to s and in the ratio 3 : 1 for G to g; but it allows for a 
j genetic factor linkage involving a quantity 0 so that cell probabilities- can 
differ from the independence pattern, 


i G g 

< . 

C _3_ 3 

° 16 10 £ 

•S A Te i 

l i 1, 

and have the form 

G g 

S i(2 + d ) HI ~ 0) f 

* HI - 0) IQ J 

I 4 1. 

The linkage quantity 6 can have any value in the interval (0, 1); the value 
® — i corresponds to independence (no linkage). This suggests analyzing the 
moc ^ linkage quantity, but allowing for the possibility of withdrawal 

to the simpler model corresponding to independence. 






For a reference value 0° = 0.1 the measurement vector and the structural 
vector are 

y v(0.1) 

44.687,806 10.689,058 

30.099,834 -16.327,805 

30.066,593 -16.327,805 

5.656,854 48.983,416. 

The structural equation for the measurement model at 0° = 0.1 is 
y - t( 0.1) = (0 - O.l)v(O.l) + e. 

The appropriate regression coefficient is 

(1) _ (y-T(0.1),v(0. 1)) = (y, v(O-D) 

“ (v(0.1), v(0.1)) (v(0.1), v(0.1)) 

_ -227.623, U _ _ Q 074;708 
3046.825,3 

(the simplification from the measurement deviation y - t( 0.1) to measure- 
ment y is based on the orthogonality property : 

I rW) = n, 

2 2r ,9'> T,(e) = °’ 

(t(0), v(0)) = 0; 

the same kind of simplification will occur throughout the examples). The 
corresponding 0 value based on linearity at the reference point is then 

0d) = 0.1 + b a) — 0.025,292. 
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The value 0 (1) == 0.025,292 can be used as a new reference value. The 
measurement vector y and the structural vector v(0 (1) ) are 


44.687,806 

30.099,834 

30.066,593 

5.656,854 


10.884,419 

-15.689,595 

-15.689,595 

97.400,229; 


the corresponding regression coefficient and value of the quantity are 

»<“ _ M _ 9, 3392,834 _ 

(v, v) 10,097.601 
0 (2) = 0.034,541. 

The nonlinearity is prominent in the fourth coordinate with mean |V ny/ 0 
and with 0 near zero. Three further iterations effectively overcome this 
nonlinearity : 

/ Z+> 0(0 


-0.074,708 

0.009,249 

0.001,094 

0.000,042 

0 . 000,001 


0.100,000 

0.025,292 

0.034,541 

0.035,635 

0.035,677 

0.035,678. 


The variance of the regression coefficient for error can be obtained in part 
from the inverse “matrix” for the last iteration: 


v 7348.777,2 | 0.009,168 j 1 

y 1 | 0.000,001 j 0.000,136,077. 

The basic error variance is J; the error variables satisfy one linear constraint; 
the normal error distribution is rotationally symmetric; the error variance is 
therefore £ for three orthonormal variables in the subspace satisfying the 
constraint. It follows that the variance of the regression coefficient for the 
error variable is 

1(0.000,136) = 0.000,034, 
and its standard deviation is 0.005,83. 
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The table records the squared lengths of the difference vectors. 

The results can be summarized in an analysis-of-vanance table. 

Source Dimension Component X 2 

- — i ~ 123.859,481 495.438 

Linkage ' 0.505,108 2.020 

Deviations 
Enor (ffg = i ) 

Total 3 


The error variance is o; = J. The components can accordingly be adjusted 
to give chi-square values: 

= Compone nt = 4 Component . 

4 

The observed chi-square value 2.020 falls between the 40% point (1.83) 
and the 30 % point (2.41) for a chi-square variable on two degrees of freedom. 
The observed P va,ue 2.020 is thus a reasonabie value for such a vanable. and * 
indicates that the data are in accord with the hn ka ge mo e . 

The chi-square value 495.438 is an extreme value : For a chi-square 
on one degree of freedom the 1 % point is 6.635 and the 0. / pomt as * 
Thus within the linkage model there is strong evidence that 0 is different 

— .ion for the quantity 0 is nornia, with mean 
0.035,678 and standard deviation 0.005,83. 

' Consider briefly the special multinomial model, ,he binoMmM For 
the binomial model the frequency variables are x, and » ~ *1 - *• 
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probabilities are p and l — p = q. The corresponding measurement variables 
and location quantities are 


V\ — V*i, r 1 -- 

y 2 = >/« — *1 = V *2, : 

The approximating measurement model is 


rip — \!nq. 


Vi = + e u 

V 2 = r 2 + e 2 ; 

the error variables e u e 2 are normal with mean 0, with standard deviation J 
and subject to the constraint 

y/p° e 1 + V 1 — p° e % = 0 

in the neighborhood of p°, 1 — p° (see Figure 4). 

An alternative approximating model can be formed by having a single 
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error variable and no constraint. The position of points on the quarter circle 
can be described by distance along the arc commencing at (0, V n): 

measurement = ^Jn s'wT l J~ , 
location quantity = yfn sin -1 Jp. 

Let e be a normal error variable with mean 0 and standard deviation The 
alternative model is 


y/n sin 1 — = y/n sin 1 yjp + e. 


The structural equation can conveniently be used in the form 

sin -1 /— = sin -1 y/p +—=.. 

V n V" 

The transformation sin -1 V t is recorded in most collections of statistical 
tables. 

5 THE COMPOSITE MULTINOMIAL MODEL 

Several independent multinomial models can be combined to form a 
composite multinomial model. The methods of analysis for the multinomial 
model in the preceding section indicate the pattern for the composite 
multinomial. 

For example, consider some data on blood types. A sample of 353 people 
from a community C are classified by blood phenotype: O, A, B, AB; and a 
sample of 364 people from a second community D are similarly classified : 



C 

D 

Total 

0 

121 

118 

239 

A 

120 

95 

215 

B 

79 

121 

200 

AB 

33 

30 

63 

Total 

353 

364 

717. 


The observable phenotype corresponds to an unobservable or latent geno- 
type. An O gene is recessive to an A gene and to a B gene. Let-p, q, r be the 
gene probabilities corresponding to A, B, O (p + # + r = 1). With random 
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mating the genotype and phenotype probabilities are 
Genotype Phenotype 


OO 

r 2 

o 

r 2 

AA 

P 2 

A 


AO 

2 pr 

P(P + 2 r) 

BB 

q 2 

B 


BO 

2 qr 

q(q + 2r) 

AB 

2 pq 

AB 

2pq 


The mode! contains effectively two quantities p,q (r = 1 - p - 

Hies “."ies 6 (“ft al '° WinS “ ^ 

then be made to thp ™ • S ’ four <l uantlties )- Withdrawal can 

ities in theto coluJLflZt "t' ^7 S “ e P™ bab i b 

-r r then be made * * 

Consider first the phenotype model for community C: 

Observed ^Observed V Mean — 


° Ml 11.000,000 - r ^I _ 353 

T o r 0 

A 120 10.954,451 V^T^) r 353 

T A t a 

B 7 9 8.888,194 + 2r ) ,.353 

T B r B 

AB ” 5.744,563 . q ™ ^353 

353 18.788,294 TAB TAB 

The derivatives are straightforward : 

dr 2 

T P - 

d(p 2 + 2 pr) B 2 

~~Bp ~ Bp [P + 2p (1 ~ P~ q)] = 2 r. 

conLpondm e nC l P r°i nt C ° mid * - P °’ f) = (h i): * hc structural vectOTS “"d 

rresponding regress, on coefficients for a first calculation, based on this 
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reference point, are: 


11 . 000,000 

10.954,451 

8.888,194 

5.744,563 


300 -18.788,294 -18.7 

151 10.847,427 -10.8 

194 -10.847,427 10.8 

563 13.285,330 ' 13.2 

(v x , v x )6i + ( v i> v 2)^2 — ( v i» 

(v g , Vp)bp + ( V 2> v 2)^2 ( V 2 ’ y)’ 


18.788,294 

10.847,427 

10.847,427 

13.285,330 


j(i> - -0.075,469, &2 X) = — °- 170 » 711 * 

The corresponding values for p, q based on linearity at p° - h, q 
n(U = 0.333,333 q a) = 0.333,333 

-0.075,469 -0.170,711 

— 0.257,864 = 0.162,622 

J^ds successive reference points (differences m parentheses). 

/ I" 

0 0.333,333 0.333,333 

(-0.075,469) (-0.170,711) 

1 0.257,864 0.162,622 

(-0.011,691) (0.010,465) 

2 0.246,173 0.173,087 

(0.000,264) (-0.000,004) 

3 0.246,437 0.173,083 

(0.000,000) (0.000,003) 

4 0.246,437 0.173,086. 

■ „ n f finp error regression coefficients are ecjual 

The variances and covariance g i T he inverse 

to the appropriate elements of the inverse matrix multiplied by * 

matrix at the last iteration is 


(v x , Vp) (Vp, v 2 ) 
(v 2 , Vp) (v 2 , v 2 ) 


865.709,17 230.305,04 

230.305,04 1,181.253,27 

0.001,218 -0.000,238 

-0.000,238 0.000,893 
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The variances for the error regression coefficients for p, q are 0.000,304, 
0.000,223; the standard deviations are 0.0174, 0.0149; and the correlation is 
-0.23. 

The analysis for community D proceeds similarly : 



Observed 

X 

V Observed 

y 

VMean 

x 

dr 

dp 

v p 

dr 

dq 

V 2 

o 

118 

10.862,780 


364 

364 





T 0 

T o 

A 

95 

9.746,794 

^ np{p + 2 r) 

364 

364 
~P — 





T A 

T A 

B 

121 

11.000,000 

V nq{q + 2 r) 

364 

-q — 

364 
r — 





T B 

T B 

AB 

30 

5.477,226 

V n2pq 

364 

q — 

364 
P 





T AB 

T AB 


364 

19.078,784 





As an initial reference point consider (p, q ) = (0.2, 0.2), a point suggested 
by the final reference point for community C: 

y V! v 2 

10.862,780 -19.078,784 -19.078,784 

9.746,794 21.633,308 -7.211,103 

11.000,000 -7.211,103 21.633,308 

5.477,226 13.490,738 13.490,738 

b™ = -0.009,223 = 0.034,224 

p w = 0.190,777 = 0.234,224 

Successive iterations provide a check for nonlinearity: ; 

i p^ qW 



0.200,000 

0.200,000 

(- 

-0.009,223) 

(0.034,224) 


0.190,777 

0.234,224 

(- 

-0.000,565) 

(0.001,453) 


0.190,212 

0.235,677 

(- 

-0.000,017) 

(0.000,011) 


0.190,195 

0.235,688 


3 




288 Inference from Frequencies Seven 

The variances and covariance for the error regression coefficients are equal 
to the appropriate elements of the inverse matrix multiplied by The inverse 
matrix at the last iteration is 



1122.751,62 238.859,21 
238.859,21 930.472,18 


0.000,942 -0.000,242 
-0.000,242 0.001,137 


The variances for the error regression coefficients for p, q are 0.000,236, 
0.000,284; the standard deviations are 0.0154, 0.0169; and the correlation is 
-0.23. 

Now consider the model based on having the same gene probabilities for- 
the two communities. Points having the same probability function should be 
combined; accordingly the data for the two communities are combined: 



Observed 

V Observed 

V Mean 

dx 

d P 

dx 

dq 


X 

y 

T 

v l 

v 2 

o 

239 

15.459,625 

V nr 2 

111 

TO 

111 

TO 

A 

215 

14.662,878 

V np(p + 2r) 

111 

r 

T A. 

111 

B 

200 

14.142,136 

V nq(q + 2r) 

Ill 

~ q ^ 

111 
r — 
r B 

AB 

63 

7.937,254 

V n2pq 

111 

q 

TAB 

111 

P 

TAB 


717 

26.776,856 





As an initial reference point consider (p, q) — (0.22, 0.20); from the 
reference value p — 0.22, q — 0.20, r = 0.58, successive iterations give 

i . q {i) 

0 0.220,000 0.200,000 

(-0.002,379) (0.004,327) 

! 0.217,621 0.204,327 

(-0.000,007) (0.000,025) 

0.217,614 0.204,352 
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The observed and fitted values can be compared by means of vectors in R 8 : 

Measurement Fitted Location Fitted Location 

y t x 

Community C (n — 352) 

O 11.0000 0.0938 

A 10.9545 -0.1104 

B 8.8882 -0.1400 

AB 5.7446 0.2570 

Community D (n = 364) 

O 10.8628 —0.0907 

A 9.7468 0.1208 

B 11.0000 0.1038 

AB 5.4772 -0.2354 

ll = 0.1066 
1% = 0.0890 

The fitted vector for the combined communities uses n = 352 for the first 
four coordinates (community C) and n = 364 for the last four coordinates 
(community D ). The difference vectors are recorded together with squared 
lengths. The analysis-of-variance table is 

Source Dimension Component % 2 

Between 
communities 
Deviations for 
community C 
Deviations for 
community D 
Error (og = |) 

Total 

The chi-square values 0.426 and 0.356 are close to the 50% value (0.455) 
for a chi-square variable on one degree of freedom. These values are 
reasonable values; they indicate for each community that the data are in 
accord with the model. 

The chi-square value 11.077 is beyond the 1% value (9.21) for a chi- 
square variable on two degrees of freedom. This is moderately strong 
evidence that the gene frequencies are different in the two communities. 


2 

2.7693 

11.077 

1 

0.1066 

0.426 

1 

0.0890 

0.356 


4 


p = 0.246,437 


P 

= 0.217,614 

q = 0.173,086 


7 

= 0.204,352 

10.9062 

0.0459 


10.8603 

11.0649 

0.7925 


10.2724 

9.0282 

-0.8781 


9.9063 

5.4876 

-0.1156 


5.6032 

p = 0.190,195 


P 

= 0.217,614 

q = 0.235,688 


q 

= 0.204,352 

10.9535 

-0.0747 


11.0282 

9.6260 

-0.8053 


10.4313 

10.8962 

0.8367 


10.0595 

5.7126 

0.0228 


5.6898 


lln = 2-7693 
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The structural distribution for p and q in community C is normal: 

Standard 

Mean Deviation Correlation 

p 0.2464 0.0174 q 23 

q 0.1731 0.0149 

The structural distribution for p and q in community D is normal: 

Standard 

Mean Deviation Correlation 


p 0.1902 0.0154 _ QJ3 

q 0.2357 0.0169 

The composite multinomial model can arise without the specialized 
structure involving a simpler quantity 6 . For i — 1 , . . . , k, let (x U) . . . , x ri ) 
be multinomial with total frequency n { and quantity ( p xi , . . . ,p ri )- Consider 
a test of significance for the equality of the multinomial probability vectors: 

(Pn, ■ ■ ■ j Prl) = • • • = (Plk’ ■ ■ ■ > Prk )• 

First consider the general model that allows different probability vectors 
in the different component multinomials. In each component the location 
quantity can be fitted exactly to the measurement vector ; the (yz)th coordinate 

for the fitted location vector is V x H . 

Now consider the restricted model having the same probability vector for 
each component multinomial. Points with the same probability function 
should be combined; accordingly the component multinomials should be 
combined into a single multinomial. Let m = S* x jt ; let designate the 
corresponding probability; and let n — S n The location quantity for the 
single combined multinomial can be fitted exactly to the measurement vector;' 
the y'th coordinate for the fitted location vector is V rrij. In order to compare 
this fitted location vector with the vector for the general model it is necessary 
to reexpress it in R r,c : The (yz)th coordinate for the location vector is then 




The squared length of the difference vector is 
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the corresponding dimension is 


( r _ i)fc _ (r - 1) = ( r - 1 )(k - 1). 


The chi-square value 



can be compared with the chi-square distribution on (r — 1 )(k - 1) degrees 
of freedom and the hypothesis of equal probability vectors assessed 
accordingly. 

If the component multinomials are in fact binomials, then the angular 
transformation mentioned at the end of the preceding section can simplify 
the analysis. 


6 THE HYPERGEOMETRIC MODEL 

Consider a single multinomial model with rk outcome events arranged in 
r rows and k columns. The hypergeometric model arises in a test of the 
independence of the row and column categories. 

Let x H be the observed frequency for the cell in the yth row and z'th column. 
The measurement vector has (y7)th coordinate 

Va = V x H . 


For the general model allowing unrestricted probabilities for the cells the 
location quantity can be fitted exactly to the measurement vector. The fitted 
location vector has (ji) th coordinate 



a = V x H . 


For the restricted model with independence the probability for the (y7)th 
cell is p j{ = p ( ^ 1) p i ? ) where pf ] is the probability for the event of the yth row 
and p { f is the probability for the event of the /th column. With independence 
the row totals provide frequencies for the row probabilities, and the column 
totals provide frequencies for the column probabilities. The fitted quantity 
corresponding to rows is 


Y x H 

vy 7 . 

3 _i 

n n 



The fitted quantity corresponding to columns is 




n 


n 
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The difference vector corresponding to the withdrawal from general model 
to the restricted model has (y7)th coordinate 


The corresponding chi-square value is 

This value can be compared with the chi-square distribution on 

rk - 1 - (r - 1) - (k - 1) = (r - 1 )(/c - 1) 

degrees of freedom, and the hypothesis of independence can be assessed 
accordingly. 

NOTES AND REFERENCES 

The traditional analysis of frequency data is based on a chi-square measure 
proposed by Karl Pearson (1900): 

„ (observed frequency — fitted mean) 2 
^ fitted mean 


Some detailed discussion concerning the fitted mean and the appropriate 
degrees of freedom may be found in Fisher (1958, 1959) and Rao (1965). 

The approximate measurement model in this chapter leads to a chi-square 
measure having the form 

yf = 4 2 (V observed — V fitted mean) 2 . , 

These measures can be related to the Poisson distribution with variable x 
and quantity Q. The Karl Pearson measure derives from the approximate 
normality of 



The measure in this chapter derives from the approximate normality of 

e* = 2 {\J x — -J 6)- 

The variables e* and z were compared in detail in Chapter Six. The variable 
t - ” presents the frequency in a location relationship to the quantity; this is 
essential in frequency applications in which an increase in the quantity 
produces an increase in the frequency. In addition, the variable e* approaches 
normality more rapidly over most of its range. 


Problems 
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The chi-square measure in this chapter has additional advantages for more 
complex models: The various chi-square values can be exhibited in an 
analysis-of-variance table, and the calculations can be based on the methods 
for the simple regression model. The values of the Karl Pearson measure at 
various fitted values cannot be compared directly; and the fitted vectors 
cannot be compared simultaneously as vectors in a Euclidean space. 

The location and normal properties of the square root transformation 
were derived for x and 6 large. The typical application involves moderate or 
large values, and the normal approximation is, in fact, quite accurate. In 
certain applications, however, there may be x arrays that contain one or 
more extreme values x = 0; for example, the tests of independence in Section 
6. The normal approximation can remain reasonably accurate by using 
x + l and 6 + \ in place of x and 6; for example, 



The data in Section 4 were given by Carver (1927) and analyzed in the 
traditional chi-square manner by Fisher (1958, 1959). The data in Section 5 
were given by Rao (1961), and analyzed in the traditional chi-square manner 
by Rao (1965). 

The examples in this chapter were analyzed by L. M. Steinberg. 

Carver, W. A. (1927), A genetic study of certain chlorophyll deficiencies in maize, Genetics, 
12,415-440. 

Fisher, R. A. (1958), Statistical Methods for Research Workers, Oliver and Boyd, Edinburgh. 
Fisher, R. A. (1959), Statistical Methods and Scientific Inference, Oliver and Boyd, 
Edinburgh. 

Pearson, K. (1900), On the criterion that a given system of deviations from the probable 
in the case of a correlated system of variables is such that it can be reasonably supposed 
to have arisen from random sampling, Phil. Mag., 1, 157-175. 

Rao, C. R. (1961), A study of large sample test criteria through properties of efficient 
estimates, Sankhya, A23, 25-40. 

Rao, C. R. (1965), Linear Statistical Inference and Its Applications, Wiley, New York. 

PROBLEMS 

1. Show that the composite multinomial model is obtained if batches of Poisson variables 
are conditioned (details in Section 1). 

2. Show that the additional conditions applied to the special composite multinomial 
produce the hypergeometric model (details in Section 1). 

3. A die was tossed 1600 times: 

Event 1 2 3 4 5 6 

Frequency 301 308 340 214 196 241. 

Make a test of significance for the hypothesis that the die is true (symmetrical). 
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4. The progeny of a mating were classified by attribute into three groups. 

Event E x E 2 E 3 

Frequency 10 53 46. 

According to a model the corresponding probabilities should be. 

Event E 1 E 2 E 3 

Probability p l 2p(l — p) (1 — p) ■ 

where 0 < p < 1. Test whether the data are in accord with the model. If appropriate, 
determine the structural distribution for p. Start: p = 0.1. (Mood and Graybill.) 

5. One hundred plants were classified according to two attributes: large L or small l ; white 
W or colored w. The frequencies are 



W 

w 

Totals 

L 

40 

20 

60 

l 

15 

25 

40 

Totals 

55 

45 

Too 


Analyze the data with the succession of models: 

(i) Independence between the attributes (use marginal probability p x for L and p. 2 
for W\ start: p x — 0.1, p 2 = 0.1). 

(ii) Equal probabilities (|) for the four cells. 

Calculate the analysis-of-variance table; make appropriate tests of significance. (Lmdley.) 


CHAPTER EIGHT 


Inference 
from Likelihood 



Without any structuring relationship between the quantity and the response 
there remains, in the case of a continuous response, only the likelihood 
function to identify a response value. This chapter considers multiple 
observations from such a continuous response. Subject to some regularity 
conditions, it is shown that, as the number of observations approaches 
infinity, the likelihood function approaches a limiting form that has a single 
variable in location relationship to the quantity. 

*1 LIKELIHOOD FUNCTION: FAR FROM THE QUANTITY 

Consider a continuous response variable x and a quantity 6. Suppose there 
is no structuring relationship between the quantity and the response, only a 
probability density function/(a; : 8) for the response variable for each value of 
the quantity: the classical model of statistics. The probability density function 
is with respect to a differential for x; this is made explicit as needed. 

The likelihood function from an observed response value was defined in 
Section 1, Chapter Four. The alternative form as a log-likelihood function is 
convenient for analysis here: 

l(x 0 :8) = R(x 0 ) + In f(x 0 :6). 

This chapter is concerned with the shape of the likelihood function, and it 
investigates the shape by examining differences for different 6 values : 

l(x 0 :6") - l(x 0 :Q') = In f(x 0 :8") - In f(x 0 :8'). 

(See Figure 1.) For this it is convenient, in this chapter, to let l(x:6) designate 
the logarithm of the density function, 

l(x:6) = lnf(x:8), 

295 




296 


Inference from Likelihood 


Eight 


Likelihood Function: Far from the Quantity 


297 



§ 1 



Figure 1 The log-likelihood difference from 6' to 6". 


and to be aware that only differences, 

l(x : 6") - l(x:6’), 

represent characteristics of the proper log-likelihood function for a response 
value x. 

In this chapter the likelihood function is examined as a variable , as a 
function of the response variable x. For this it is convenient notationally to let 
6° designate the actual value of the quantity, the value that determines the 
distribution of the variable x, and to let 6 be a free variable designating 
possible values for the quantity. The likelihood function is examined as a 
variable by analyzing differences, such as 

d(x:6) = l(x:6) - /(*:0°) 

= ln/(x:0) — ln/(x:0°), 

as variables, based on the response variable x. 

Consider multiple observations on the response: x 1} ... , x n . The likelihood 
difference for the vector response is 

d(x : 0) = l(x:6) - l(x:0°) 

= ln/(x:0) - ln/(x:0° 

= i(l(x i :d)-l(x i :d 0 )) 

i 

= J,d(x i :6). 



Figure 2 The log-likelihood difference from 6° to d : d(x : 6). The supremum of the log- 
likelihood difference for 6 outside the <5 neighborhood of 0°. 

In this section this difference is examined for 0 values outside a small neighbor- 
hood of the actual 0°. It is shown under mild conditions that the maximum 
value outside a. given small neighborhood goes to — oo with probability 1 as 
n 00 (see Figure 2). This main result is presented as a theorem in this 
section; its proof is based on a succession of lemmas. - ' 

Lemma 1. If the distribution f(x:d) is different from the distribution 
/ ( x : 0°) and if the mean 

E{l(x- 6°):8 0 } 

is finite, then 

E{l(x:6):6 0 } < E{l(x-.6°):d 0 } 
or 

E{d(x:8):6 0 } < 0. 

Proof. A real-valued function c(t ) of a real variable /'is strictly convex if 

c(at' + (1 - a)t") < ac(f) + (1 - a)c(f) 

for all t', t" and 0 < a < 1. A strictly convex function c{t) has a line of support 
l{t) at any point /' : 

l{i) < c(f), t ^ /', 
l{t’) = c(t'), 

where /(/) is linear in r. (See Figure 3.) If / is a variable having a distribution 
with mean E{t } = v, then 


c(E{t}) < E{c(t)} 
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unless t has all probability at v (that is, unless t is a constant). This follows 
by letting /(f) be the line of support at v: 

c(E{t}) = l(E{t}) = £{/(/)} < E{c(t)}. 


Now consider the mean value of the likelihood difference d(x:6) — 
l(x:d) - l(x:6°): 


E{d(x:6):0°} = £{/(*". 0):0 0 } - £{Z(z:0 o ):0°} 

= £{ln f(x:6):6 0 } - E{\nf(x:6°) :0 0 } 

1 /(*:0°) 1 \f(x-0°) 

< In 1 = 0. 


: 0 ° 


The succession of steps uses the strict convexity of — In / and the fact that 
the integral of/(ar.0) over points having/(z:0°) > 0 is less than or equal to 
1. This establishes the lemma. 

Lemma 2. If the distribution f(x:6 ) is different from the distribution' 
/(ar.0°), and if the mean 

£{/(*: 0 °): 0 0 } 


is finite, then for a sequence of response observations, x x , x 2 , . . . , 

Pr [ lim 2 d{x t : 0) — — go : 0°j = 1. 

(ra-+oo 1 ) 

Proof. The lemma is concerned with the likelihood difference 

Z(x : 0) - Z(x:0°) = 2 (1(^:9) - l(x i: d 0 )) = 2 dfoB); 

i i 

the lemma asserts that with probability 1 the limit is — co as n — > oo. 


i 

| 

; 

: 


By Lemma 1 the mean value of d(x : 0) satisfies 
E{d(x:d):d 0 } = € < 0. 
Then, by the strong law of large numbers, 

/ 2 dixf.Q) \ 

Pr lim 1 = e:d° -- 


1. 


Pr | lim 2 d^xf.O) — — oo:0° = 1 


Hence 


and the lemma is established. 

Corollary. If E{d(x ) : 0°} < 0 for some d(x), then 


Pr lim 2 d(x { ) = — co:0°l = 1. 


Now consider several 0 values, 0 1 , . . . , 8 h and suppose that the distribution 
of x for each of these values is different from the distribution fix'. 0°). The 
following lemma asserts that the maximum likelihood difference to these 
values goes to — oo with probability 1 as n -> oo. 

Lemma 3. If the distributions /( x:d x ), . . . , f (x: 6 h ) are different from 
the distribution f(x:6°), and if the mean 

E{l(x:6°):d 0 } 

is finite, then for a sequence of response observations x u x 2) . . . 

Pr | lim max 2 d(x { :6 a ) = — oo : 0° j = 1 . 

( n-* co a — 1 1 J 

Proof. Consider a sequence J„(a) of real numbers for each a = 1 , ... ,h. 
lims„(a) = — oo 


If 


for each a, then 


and conversely. 


lim max s„(a) 

71— ►oo a=l 


■oo; 
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Consider events A u . . . , A h and suppose Pr {A a } = 1 for each a. Then the 
probability that all the events A a occur is equal to 1 : 


Pr JO A a J = 1 - Pr | U 

>l-!Pr {A a } = 1-20 

a=l «=1 

= 1, 

where A a designates the nonoccurrence of A a . 

Let A a be the event: 

lim ^ d(x i :6 a ) = — oo. 

W-1-00 1 

The results in the preceding paragraphs then establish the lemma. 

Corollary. If E{d a (x) : 0°} < 0 for d\x), ... , d\x), then 

{ . h n 1 

lim max rHX) = — °o:0° = 1- 

rt-*a> ct=l 1 ; 

The main theorem is concerned with the maximum likelihood difference 
to 0 values outside a neighborhood of 0°. This needs continuity so that what 
happens at a 0“ value controls what happens for 0 values near 0®. It also needs 
some sort of uniformity so that a finite number of 0® values controls what 
happens for all the values outside a neighborhood of 0°. 

Now suppose that the range for the quantity 0 is Euclidean space R k . The 
assumptions needed for the main theorem can be expressed more easily if 
the point-at-infinity co is added to R k . Let d(6', 0") designate Euclidean 
distance in R k , let 

B p (0') = {6: d(d,d')<p}" 

designate the ball of radius p about 0', and let 

i?p(co) = {6: d(Q, 0) > 1 Ip} 

designate a ball about oo (the region outside the sphere of radius 1/p). The 
balls B p (6) are neighborhoods for the points in R k U {co}. 

Assumption 1. f{x:d) is a continuous function! of 0 in R k U {oo}. 
For each 0 0°, the distribution f(x:6) is different from the distribution 

! The function f(x: 00) is taken to be the limiting function lim/(x:ff) as implied by 
continuity. e_ * c0 
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f(x:d°). Forf each 0' in R k U { 00 } there is a neighborhood B p (d') such that 
sup l(x:6) < M(x:6'); 

6 in B„W) 

the mean values 

E{M(x:6'):6 0 }, E{l(x:d°):d Q }: 

are finite. 

Lemma 4. If f(x:d ) satisfies Assumption 1, then 

lim e! sup l(x:8):6°\ = E{l(x:6'):6 0 }. 
p-* 0 IflinBpifl') J 

Proof. By the continuity part of Assumption 1 , it follows that 
lim sup l(x\6) = l(x:d'). 

p -* 0 0inBp(0') 

The function 



sup l(x : 0) 

0 in B p (0') 

is monotone decreasing as p -> 0. For p small enough it is bounded above by 
the function M(x:d'), which has finite mean value. It follows, by the mono- 
tone convergence theorem for integrals, that the limit operation with respect 
to p can be carried outside the integral sign : 

jEjfim sup l(x : 0) : 0°j — lim eI sup l(x : 0) : 0°] . 

1P-+0 0inBp(0') j p->0 (0 inBp(0') j 

This establishes the lemma. 

Theorem 5. If the classical model f(x:d) satisfies Assumption 1, then 
Pr | lim sup d(x i :d) = — °° : 0 o j = 1. 

U -00 <Z(0,0°)><5 1 j 

Proof. For each value 0' different from 0° there is, by Lemmas 4 and 1, a 
neighborhood B p (d') such that 
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Consider the region of 0 values in R k U {co} having 3(0, 6°) > 6: each 6' 
value in this region has a neighborhood B p (9') for which the preceding 
inequality holds. By the Heine-Borel theorem a finite number of these 
neighborhoods can be found that cover the region. Let the corresponding 0 
values be 6\ . . . , d h (one of these is 0 = co). Then by the corollary of 
Lemma 3, with 


it follows that 


djx) = sup l(x:9) — 

8 in B p (8 a ) 


Pr lim max 


sup : 6) 

lain BA8 a ) 


The theorem is then established by noting that 



max T ( sup Kxf.d) - Kxf.d 0 )) > sup £ (/(*<:©) - /(^:6 0 )). 

« 1=1 \0 in B p (8 a ) J d(0,e°)>3i=l 

Under the mild conditions of Assumption 1 , the theorem asserts that for a 
sequence of response observations x x , x 2 , . . . the maximum likelihood (as a 
difference relative to 0°) outside a neighborhood of the actual 9° goes to — co 
with probability 1. 

2 LIKELIHOOD FUNCTION: NEAR THE QUANTITY 

Now consider the form of the likelihood function near the quantity, in a 
small neighborhood of the actual 9°. With multiple observations x 1 ,...,x n 
on the response, the likelihood difference from 0° to a general value 0 is 

d(x:9) — l(x:9) — l(x:9°) 

— In /(x:0) — ln/(x:0°) 

= £ (1(A : 0) - K x i : 0°)) = £ d(x t : 0). 

i 1 

This likelihood difference is examined in this section for 0 in a small 
neighborhood of 0°; and it is examined as a variable, based on the multiple 
response variable x. It is shown that as n — ^ co the limiting form is quadratic, 
with a single variable in location relationship to the quantity. The analysis is 
given in Section 2.1 for the notationally easy case of a real quantity 9; the 
analogous results for a vector quantity are summarized at the end of 
Section 2.2. 
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2.1 A Real-Valued Quantity 6. Consider first the slope of the likelihood 
function at the actual value 0°. Let 


l {1 \x:9) = K x :6) = — In f(x:6), 

o9 30 

/ <1U( a: : 0) = — l(x:Q) = A ’d /(a: : 0), 
du 

f n \x:9) = — f(x:6). 

3 0 2 


The following assumptions are convenient. 

Assumption 2. In a neighborhood of 0°, 

|/ (11) (z:0)| < M x (x) 

and M x (x) and / (]) (x:0°) are integrable. 

Assumption 3. The likelihood derivatives l {1) (x:9), l ai) (x\9) exist in a 
neighborhood of 0° and 

E{b u \x:9°):9 0 } 

is finite-valued. 


Lemma 6 establishes mean value properties of the likelihood derivatives: 

Lemma 6. If the classical model f(x:9) satisfies Assumptions 2 and 3, 
| then 

| E{l^(x-.d°):6 0 } = 0, 

i. ' — 

E{-l< n ^x:9°):9 0 } = j(6°) = var {l^(x:9°):9 0 }. - 

Proof. The density function f(x:6) can be expanded in a Taylor series 
j with respect to 0 at 0° : 

f{x:9) = f(x: 0°) + (0 - 9°)f a \x:9°) + - f {n \x: 6*), 

where |0* — 0°| < |0 — 0°|. By Assumption 2 it follows that f{x:9) and 
f^fx-.d) are bounded by integrable functions for 0 in a neighborhood of 0°. 
This permits differentiation with respect to 0 to be carried through the 
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integral sign: 

j f(x:d) dx = 1, 

J* / (1> (x: 0) dx = 0, 

|*/ (11) (x:0) dx = 0 

(the differential dx can be a Euclidean differential, or it can be a general 
differential). By Assumption 3 the integrands can be rearranged:. 


J/ (1) ('x:0) dx = jl {1 \x:6)f(x:d) dx = 0, 

lf ll \x:d)dx = J(Z (11) (x:0)/(a;:0) + l (1 \x-.d)f {1 \x:Q))dx = 0. 


At 0 0 this gives 

E{l w (x:d 0 )\8 0 } = 0, 

E{—l {n) (x:6°):6 0 } = E{(l^(x-.6°)y:6 0 }, 

which establishes the lemma. The mean-value characteristic of the likelihood 
slope at 0°, 

E{Q a) (x:0°)y:d o } = var {l^(x:6°):d 0 } = j(6°), 
is called the Fisher information at 0°. 

For a sequence of response observations x l9 x 2 , . . . , the second lemma in 
this section (Lemma 7) establishes some distribution properties of the likeli- 
hood derivative at 0°. For the vector x = ( x 1} . . . , xf)' the logarithm of the 
density function is 

?(x:0) = 2 K x i'd), 

i 

and derivatives are 

'• 1 

J (ll )( X :0) = ii (ll, (x,;fl). 

1 

At f ' the properties of mean and variance of independent variables give 
£:{/< 1 >(x : 0°) : 0°> = nE{h l) (x\6°):6 0 }, 
var {/ (1) (x: 0°) : 0°} = n var {l {l) (x:6°):d Q }. 


\ 

1 
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Lemma 7. If E{l<»(x : 0°) : 0°}, var {/ (1 >( x : 0°) : 0°} exist and if 
E{l^(x:d 0 ):d 0 } = 0, 

then the distribution of 

Z (1) (x:0°) = f l{1) ( x i- e °) 

V n 

approaches the normal distribution with mean 0 and variance j(d° ) as n —>co. 
If j(6°) > 0, then the distribution of 


/ !1, (x:0°) 2 Z(1) ( x i :0 °) 

approaches the normal distribution with mean 0 and variance l/y(0°). 

Proof. A direct application of the central limit theorem. 

Now consider the form of the likelihood function near 0° as the number n 
of response observations becomes large. Suppose the model f(x:6) satisfies 
Assumptions 2 and 4 (Assumption 4 is a needed stronger version of 
Assumption 3); 

Assumption 4. In a neighborhood of 0°, 

|/ (U1 >(a;:0)| < M 2 (x), 
and 

E{l(x:6°):d 0 }, E{l (l >(x:d°):d 0 } 

E{l (n )(x : 0°) : 0°} ^ 0, E{M 2 { x ) : 0°} = - q(d ° ) 

are finite-valued. 

The likelihood difference near 0° can be expanded in a Taylor series 
(Assumption 4) : 

Z(x:0) — Z(x:0°) 

= (0 - 0°)Z (1, (x:0°) + ( 6 ~jy /(u) (x . e0) + (0 -d°f RM ^ 

where 

M 2 (x) = 2 
1 

and \R\ < 1 ; the expansion replaces a continuum of functions of x indexed 
by 0 (on the left side) by effectively three functions of x (on the right side). 
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With Lemma 6 the strong law of large numbers gives 

Pr l Ura !^fc»!5 = _j ( 6- ) :8«) = l > 

U-oo n j 

Prllim^^ = -q(6 0 ):d 0 } = 1. 

In-* oo H I 

The convergence theorem for orthogonal variables gives 
f l il \x:d°) 0 \ 

f 'fc7^- oJ r 

for any e > 0. These probability limits suggest a rearrangement of the 
expression for the likelihood difference near 0°. Let 

F(x:0) = - ^/ <u) (x: 0°) + BM 2 (x)j ; 

then 

Z(x:0) - Z(x : 0°) = (0 - 0°)/ U) (x:0°) - (0 - 0°) 3 - 


= -^(x:0)(0-0°-g^) 2 + ^ (/,1)(X:0O))a 

\ V(x:6) ) 2 F(x:0) 

2 n V F(x:0)/n ) 

1 (Z (1) (x:0°)/» 2 

2 F(x:0)/n 

Consider the likelihood difference about 0° in units of length n~' A : 

6 = 6° + rrr' A . 

The likelihood difference is 
Z(x:0° + m~ A ) — Z(x:0°) 

„-t oxiifr-j. ML.\\\m w >.( . m \ 

•2 n \ \F(x:0)/rt// 2 \F(x:0)/n/ 

where 

Z (1) (x:0°) 

vv _ V«)(e°) ' 

Preceding results derived from the strong law of large numbers give 


Pr f lim = j(0°) :0°| = 1, 

(n-'-oo tl j 


Likelihood Function: Near the Quantity 


and hence 


Pr I lim 


V(x:6)/n 


1:0° = 1. 


By Lemma 7, the variable w has a limiting normal distribution with mean 0 
and variance l/y(0°). It follows then that with probability 1 the limiting form 
of the likelihood difference for bounded r is 


lim (/(x:0° +m~ A ) - l(x:d 0 )) 


m (w _ Tr+ m„> 

2 2 


The limiting form involves a single variable vv having a limiting normal 
distribution with mean 0 and variance l/y(0°). 

Now consider the likelihood difference elsewhere in the neighborhood 
(0° - <5, 0° + cS) of 0°. For this choose d small enough that Assumption 4 
will hold in the interval (0° - 0, 0° + 6) and small enough that 


m ~ \ \q(d °) | > V, 


where V is a positive number. Let 

0 = 0° + rn~' A+e , 

where |t| < <5 and 0 < e < The likelihood difference is 
/(x:0° + t n~ A+t ) - /(x:0°) = m- lA+e l ll) (x:d° ) - T 2 n~ l+2e 


V(x:Q) 


2£ /t 2 V (x : 0) T / (1) (x:0°) 


With probability 1 the expression in parentheses is greater than t 2 F/2 for all 
e in the range 0 < e 0 < 6 < i. Then with probability 1 the maximum likeli- 
hood difference for e in the range 0 < e 0 < e < i has limit - co. This holds 
for all e 0 (€ < e 0 < 

The maximum likelihood difference outside a neighborhood (0° — d, 
0° + (5), by. Assumption 1 and Theorem 5 (preceding section), goes to — oo 
with probability 1. 

Hence ; the limiting form of the likelihood difference relative to the actual 6 0 is 


** 2 

in terms of r given by d = 0° + t n~' A , and is — oo otherwise; the variable vv 
has a limiting normal distribution with mean 0 and variance 1 jj(6°) (see Figure 4). 


1 



308 Inference from Likelihood Eight 



Figure 4. The limiting form of the likelihood difference relative to the actual 6°. ' 

The likelihood function has a maximum at r = w. If the likelihood differ- 
ence is taken with respect to the maximum likelihood, the limiting form of the 
likelihood function becomes 


2 


(w 



m 0 > + m w A 

2 2 I 



rf 


(see Figure 5). 

The limiting form of the likelihood function involves a single variable w; 
the variable w has a limiting normal distribution with variance 1 //(0°); the 
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limiting form of the likelihood function is the likelihood function of w as a 
normal variable with location quantity r; the actual value for the quantity r 
is r = 0 (note that the actual value for the quantity was chosen as the origin 
for the present analysis). 

*2.2 A Vector-Valued Quantity 0. Consider now a vector-valued quantity 
0 = (0i, . . . , d k ); and let 0° = (0°, . . . , 6°) designate the actual value of 
the quantity, the value that determines the distribution of the response 
variable x. 

Derivatives of the likelihood function can be taken with respect to the 
different coordinates of 0 : 


^/(z:0) = /(>>(*; 8) = ~lnf(x:Q), 


a 2 


\ D {x : 0) = (/ U) (z:0), . . . , l ik \x : 0))', 


de, dd r 


l(x-.B) = r r \x:Q) 


dQ, dQ r 


ln/(z:0), 


dd< 


f(x:Q) = / 3) (x:0), 


Assumptions 2, 3, and 4 can be generalized by replacing a first derivative by 
each of the first-order derivatives in turn; by replacing a second derivative by 
each of the second-order derivatives in turn ; by replacing a third derivative 
by each of the third-order derivatives in turn; and by replacing 


£{/ (11 >(x:0 o ):0 0 } ^ 0 

by the nonsingularity of the matrix 


f 

^/ m) (x:0°) • 

■ / (lfc) (x:0 0 )' 





:0° 

V 

l {kl) (x:Q°) ■ 

• /<“>(x:0°) 

J 

J 


For a vector-valued quantity 0, Lemma 6 becomes the following: 

Lemma 8. If the classical model f{x:B) satisfies the generalized Assump- 
tions 2, and 3, then 

E{l D (x: 0°); 0°} = 0, 

£{l D (x:0 o )l ,jD (z:0°):e 0 } = ./(©°), 
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'urn ■■■ h*m 


Eight 


J( 6°) 

I 

j a m ••• h*m 

y (go) __ £{_/<«')(a;:0 o ):0 0 } = cov {/ (i) (x:0°), l u >(x: 0°) : 0°}. 

Proof. A direct extension of the proof of Lemma 6. The mean-value 
characteristic of the likelihood gradient (F^x: 0)) at 0°, 

7(0°) = cov {l D (x-.Q 0 ),\ D (x:Q°):B 0 }, 

is called the Fisher information matrix at 0°. 

For a sequence of response observations x u x 2 ,... the likelihood 
derivatives are additive: 

Z(x:0) = I KxfB), 

i 

1 

l ur) (\: 0) = i lW) (x t :Q). 

i 

At 0° the properties of mean and variance for independent variables give 
E{ l fl (x:0°):0 0 } = nE{\ D (x:Q°):Q°}, 
cov {P(x:0 o ), \ D (x : 0°) : 0°} = n cov (l^x:© 0 ), l^x:© 0 ):© 0 }. 

For the vector-valued quantity 0, Lemma 7 becomes. . 

Lemma 9. If the mean and variance of I°(x:0°) exist at 0°, and if 
E{\ d (x:B°):Q 0 } = 0, then the distribution of 


F(x:0°) 

-Jn 


2i D 0v.e°) 

_l 

fn 


approaches the multivariate normal distribution with mean 0 and covariance 
matrix J(0°). If 7(0°) is nonsingular, then the distribution of 

w = — J-\Q°)l D (x:Q°) = - 7 = 2 I D ( x i '• ®°) 

V n fn 1 
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approaches the multivariate normal distribution with mean 0 and covariance 
matrix /^(e 0 ). 

Proof. A direct application of the central limit theorem for vector 
variables. 

Now consider the form of the likelihood function near 0° as the number n 
of response observations becomes large. Suppose the model /(x:0) satisfies 
the generalized Assumptions 2 and 4. The likelihood difference near 0° can be 
expanded in a Taylor series and rearranged following the pattern for a real 
quantity. 

The limiting form of the likelihood difference relative to the actual 0° is 
-Kw - t)7(0°)(w - t) + |w'J(0°)w 
in terms of 0 = 0° -f t n~^, and is — oo otherwise. The variable w, 

fn 

has a limiting multivariate normal distribution with means equal to zero and 
covariance matrix (0°). 

The likelihood function has a maximum at t = w. If the likelihood 
difference is taken with respect to the maximum likelihood, then the limiting 
form of the likelihood function becomes 

— K w “ x)'J(0°)(w — t) + \w'J (0°)w — [— 10 + |w'J(0°)w] 

= — i( w — t)7(0°)(w — x). 

See Figure 6. 

The limiting form of the likelihood function involves a single variable w; 
the variable w has a limiting normal distribution with covariance matrix 
•/-^(O 0 ); the limiting form of the likelihood function is that of w as a normal 
variable with location quantity x; the actual value for the quantity x is x = 0 
(note that the actual value for the quantity was chosen as the origin for the 
present description). 

3 LIKELIHOOD INFERENCE: LARGE SAMPLE 

Consider a continuous response variable x and a quantity 6. Suppose there 
is no structuring relationship between the quantity 6 and the response 
x — -just a classical model /(x: 6) satisfying Assumptions 1 , 2, and 4. Consider 
a large number n of response observations x l5 . . . , x n . 

Within the classical model a multiple response vector (x l5 . . . , x„) has 
only its likelihood function to identify it. By the results in the preceding 
section, for sufficiently large n the likelihood function has normal quadratic 





Figure 6. The limiting form of the likelihood difference taken relative to the maximum 
likelihood; w is a variable with limiting multivariate normal distribution with covariance 
matrix J -1 (8°) ; the location quantity for w is v and t has actual value 0 (but only because 
the actual value was chosen as the origin in the analysis). 


form in the neighborhood of the actual value and approximates oo else- 
where. In the neighborhood of a value 6* near the actual value the likelihood 
depends on the single variable w: 



/ (1) (x:0*) 


The variable w has a limiting normal distribution with location quantity! r. 


t = Jn(d - e*), 


and variance l/;(0*); the limiting likelihood function is the likelihood 
function for the normal variable w. 

A change in the. quantity 0 produces a change in the likelihood function 
l(x : 0) at various response values x. This produces a decrease in probability 
for some values and an increase for other values. For a large sample this 
produces a loss of certain response observations and a gain of other response 
observations, and thereby produces a change in vv. The large sample model 


| If 6 = 0° designates the actual value for the quantity 0, then t t° V/j(0 d ) 
designates the actual value of the location quantity r. The details of the change of reference 
point from 0° to an adjacent 0* are examined in Problem 3. 
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can then be approximated by the simple measurement model ; 


w = r T e. 

The model has an error variable e with a normal distribution with mean 0 and 
variance l/j(6*); and it has a structural equation in which a realized value 
from the error distribution provides the link between the observed value of w, 


Z (1) (x:0*) 


and the unknown value of r. 


This measurement model with reference value 0* is applicable for 0* close 
to the true 0° (within the range for the approximations in Section 2.1). 

For convenience, the measurement model can be expressed directly in 
terms of the quantity 0. A change of scale by the factor n~ l/ $ gives 


’■ = (0 - 0 *) + 4 = . 

«/(0*) V n 

(Note that the denominator «/(0*) is the variance of the likelihood derivative 
for the multiple response calculated at 0*: 

nj(d*) = var {/ (1) (x:0*):0*} = var / (1) (^:0*):0*}; 

it is the Fisher information at 8* for the multiple response x.) 

Now consider the analysis of a multiple-response vector ( x x , . . . , x n ). 
General familiarity with the application may suggest an initial reference 
value 0 (O) . The limiting likelihood function appropriate to the reference value 
0 (o) may indicate that the maximum likelihood value is elsewhere; the 
indicated position from the limiting likelihood is at the 0 value given by 

0 _ fl(o> _ i lU> (x:0 (o) ) 
nj(8 {0) ) 

Designate this value of 0 by 

0U> _ 0(oj , ? (1) (x:fl <n> ) 
nj(6 (0) ) 
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A similar analysis at the reference value 0 (1) may indicate that the maximum 
is again elsewhere : 

«... _ „<» + gfeQ 

Typically, several iterations lead to a reference value 0* located approxi- 
mately at the maximum of the likelihood function. The approximating 
measurement model is then 


Z (1) (x:0*) 


(0 - 0*) + -7= 


Analysis of the simple measurement model is given in Chapter One. Tests of 
significance concerning a 0 value can be made by calculating the correspond- 
ing error value and comparing it with the error distribution. The structural 
distribution for the quantity 0 is normal with variance 1 /nj(6*) and located at 

/ (1> (x: 0*) 


(If 0* is the exact maximum, then /^(x:6*) = 0 and the distribution is 
located at 0*.) 

The corresponding results for a vector quantity 0 can be stated briefly. 
Let f(x:0) be a classical model satisfying the generalized Assumptions 1, 2, 
and 4. Consider a multiple response vector (aq, . . . , x n ) with n large. For 
a value 0* near the actual 0° the large sample model can be approximated by 
the simple measurement model: 


The model has an error variable e with a multivariate normal distribution with 
mean 0 and covariance matrix ; and it has a structural equation in which 

a realized value from the error distribution provides the link between the 
observed value of w, 

w = Jn — - — - 1 (x : 0*), 
n 


and the unknown value of 


V «(0 - 0 *). 
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i i‘|.; 


The measurement model can be expressed directly in terms of the quantity 

e 

-Jn ’ 

j — P(x:0*) = (0 - 0*) + -4 

n -Jn 

(note that the matrix multiplying the likelihood gradient vector is the inverse 
of the Fisher information matrix, 

nJ(0*) = £{F(x: 0*) r c (x:0*):0*} = fJ| l^:0*)l^(x i: 0*) ; 0*j, 
for the multiple response x). 

Now consider the analysis of a response vector (aq, . . . , x n ). The approxi- 
mating model .at a reference value 0 (O) may indicate that the maximum like- 
lihood value is elsewhere: 

e (1) = 0 (o) + — 1(6<0>) I°(x:8 l0) ). 
n 

The model at 0 (1) may indicate that the maximum is elsewhere: 

L 0 (2) = 0 (1) +^^F(x:0 (1) ). 

;; n 

; Typically, a reference value 0* near the maximum point for the likelihood 
may be obtained in several iterations. 

Tests of significance can be made by using the approximating measurement 
model. 1 he structural distribution for 0 is multivariate normal with covariance 
J matrix (77/(8 *)) -1 and located at 

0 * + (nJ(d*))-H»(x:d*). 

I 

| 

•j 

NOTES AND REFERENCES 

The likelihood function was promoted and developed by R. A. Fisher; 
j see Notes and References, Chapter Four. The proof in Section 1 that the 
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The limiting normality of the location of maximum likelihood was 
presented by Fisher (1922) and given heuristic proof in his subsequent papers. 

* It has been widely examined in the literature. 

| The limiting quadratic form of the likelihood function has received general 
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253-260. 

Loeve, M. (1960), Probability Theory, Van Nostrand, Princeton. 
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and show that w* has a limiting normal distribution with location quantity t* and 
variance 1 lj(6*); the actual value for t* is — <5. 

4. Consider the simple measurement model 

f(e) de, 
x = Q + e. 

The corresponding classical model for the response variable x is 

fix — 8) dx. 

And the corresponding model for a multiple response is 

IT /(*< - II dx i- 

Simplify Assumptions 1, 2, and 4 and express them in terms of properties of the error 
density. 

5 ( Continuation ). Show that the likelihood function from x in the full model 

-R + (x) Jim ~ IT dx i 

is the same as the likelihood function from 9 = S(x) in the conditional model. 

; 6 ( Continuation ). Show that the location 0(x) of maximum likelihood is a location variable. 

Show that the conditional model for 9(x) given the orbit is 

*(d) JT f(8 + d i - 8) dS. 

7 {Continuation). Under Assumptions 1, 2, and 4 show that with probability 1 the limiting 
conditional distribution of Vn (8(x) - 8°) is normal with mean 0 and variance l//(0°). 
Thus the conditional analysis, given orbit, applied to the classical location model agrees 
in large samples with the likelihood analysis in Sections 2 and 3. (Fraser, 1964a,b.) 


CHAPTER NINE 



Precision. 

and Information 


The simple measurement model in Chapter One describes multiple measure- 
ments on a real-valued quantity 0 ; the error distribution of the measuring 
instrument is known. 

If the error probability distribution in the reduced model is broad and 
diffuse, the measurements on 0 can be called imprecise. Alternatively, if the 
error probability distribution is narrow and concentrated, the measurements 
can be called precise. This chapter examines the concept . of precision for 
measurement models, for structural models, and for classical models with 
large-sample likelihood inference. 

The simple measurement model produces a structural probability distribu- 
tion for the quantity 0. A large value of the structural density function at a 
certain value for 0 is information in favor of that value or information for 
that value. A small value of the structural density at a value for 0 is informa- 
tion against that value. This chapter also examines the concept of information 
for measurement- models and for structural models. 

1 PRECISION: WITH A REAL-VALUED QUANTITY 

Consider the simple measurement model but in a slightly generalized form. 
Let aq be a first measurement on 0 with corresponding error distribution 
ffe i) de 1: . . . , and let x„ be an «th measurement on 0 with corresponding 
error distribution f n (e n ) de n . The composite model, 

n/«wn de it 

x = [0, l]e, 

is a structural model. 

Let r(x) be a location variable and d(x) be the reference point, the 
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The reduced measurement model is 


where r designates a normal variable with mean 0 and variance a 2 R . 

Thus, with normal error components, the error variable in the reduced 
model is also normal; and the reciprocal variance for the error in the reduced 
model is the sum of the reciprocal variances for the components: 

_1_ ” J_ 

2 Jl 

Or 1 O i ■ 

A small reciprocal variance gives a diffuse distribution and implies imprecise 
measurement. A large reciprocal variance gives a concentrated distribution 
and implies precise measurement. 

For a normal error distribution the precision is defined to be the reciprocal 
variance. 

Let j R be the precision in the reduced model and j\ be the precision of the 
ith error component: 

• _JL _1 

Hit '2 ’Hi 2 ' 

a R a i 

Then, with normal error components in the measurement model, the error in 
the reduced model is also normal and its precision is the sum of the component 
precisions: 

Jr —ji + ' ' ' +jn- 

Now consider a sequence of response variables x x , . . . , x n with corre- 
sponding classical models f x (x x \Q), . . . ,f n (x n :d) involving a real quantity 6. 
Suppose that some generalized assumptions are fulfilled that cover the 
extension! °f the limiting-likelihood results to variables with differing 
distributions. Then, as the number of response variables approaches infinity, 
the likelihood function approaches normal quadratic form and the model 
admits approximation by the simple measurement model 




8*) + r, 


jThe central limit theorem and the law of large numbers have extensions for differently 
distributed variables ; the details for the generalized assumptions are not of importance here. 
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where r designates a normal variable with mean 0 and precision 2 7,(6*), 
where 

VW) = |j ~ 1 n/, (*<:«*). 

jfd*) = var 

= £{-/- n) Cq:0*):0*}, 

and where 6* is in a small neighborhood of the actual 6°. 

Thus the error in the approximating model is normal and its precision 
j R {b*) is the sum of components . . . ,j n (6*), one from each of the com- 

ponent variables. The approximating model is as if each component Variable 
x i were a measurement variable with normal error and with precision jfd*). 

For a classical model f{x\Q) satisfying Assumptions 1,2, and 4 in Chapter 
Eight, the precision at the value 6 is defined to be the variance of the likelihood 
derivative: 

j(d) = var {l^{x-.d):Q}. 

When independent variables involving the same quantity 6 are combined, 
the precisions are added to obtain the precision of the composite variable. 
The precision for a large number of variables is equal to the precision of the 
approximating normal measurement model. For a small number of variables, 
specifically measurement variables, the conditional error distribution depends 
typically on the orbit; the concept of precision is inadequate. A concept of 
information in Section 3 is then needed to give a general description of the 
measurement process. 


2 PRECISION: WITH A VECTOR-VALUED QUANTITY 


Consider the simple measurement model extended to cover a vector 
quantity 0 = (6 X , . . . , 6 k )’. For an zth measurement let 

’ • • • j ^ki) de x i de k t fipf dtti 

be the error distribution, and let x f = 0 + e*, 



I 

! 


be the structural equation; the group is the location group on R k : 

G = {[a, /]: — co < < co}, 

where 



[a, 7]x = a + x. 
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The multiple model for n measurements is 

n /Mild** 

(x l5 . . . ,X„) = [©, 71(e !, . . - ,ej; 

this is a slightly generalized form of the simple composite-measurement model 

in Problem 8, Chapter Four. 

Let r(x 1( . . . , x n ) be a location variable: 

r(a + x l5 . . . , a + x„) = a + r(x x , . . . , x n ), 

and let d 19 . . . , d n be the corresponding deviations: 

d i(Xx , . • • , x„) = x £ — r(x l9 . . . , x n ). 

The reduced model is 

k(A u . . - , djll + d *) ir > 

r(x l9 . . . , x„) = 6 + r. 

The X, E notation would be simpler, but it is more convenient here to have 
separate designations for the individual measurements. 

Consider the case of normal error. Let e £ be multivariate normal wi 
known inverse covariance matrix Jp. 

/i (e i) *, = «P {-1 *<■ 

The conditional error distribution is 

g(r:d x , . . . , d. n ) dr = /c' exp {-£ 2( r + d t )'J r i (r + d,)} dr 

{ 71 n \ 

— £r' 2 d t -r — r' 2 Ji dij dr. 

This reduced-model distribution is also multivariate normal; its inverse 
covariance matrix is 

71 

Jr — 2 d* 

i 

but its location depends on the choice of reference point. A convenient 
reference point is the center of the conditional distribution; this requires 

2 = 0 

and leads to 

r(X x , • • • , X„) = Jr 2 Ji X i 

i 
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(the coordinates are weighted in proportion to inverse covariance matrices). 
With this choice for the location variable, the conditional distribution 
becomes 

g(r:«*i, ■ • • . dj dr = exp {-ir'J^r} dr; 

( 2tt ) 

this distribution is the same on each orbit ; and it is multivariate normal with 
mean 0 and inverse covariance matrix 

i 

The reduced measurement model is 


r, 

Jr 2 dpq = 0 + r, 
i 

where the error r is multivariate normal with inverse covariance matrix 
J R =^Ji- 

For a normal error distribution the precision is defined to be the inverse 
covariance matrix. 

The precision for the zth component is Jp, the precision for the reduced 
model is Jr'- 

Jr = 2 Ji- 

i 

Thus, with normal error components, the reduced model has normal error and 
its precision is the sum of the component precisions. 

Now consider a sequence of response variables x x , ... ,x n with corre- 
sponding classical models ffixp. 0), . . . ,f n (x n :B) involving the vector 
quantity 0. Suppose that generalized assumptions are fulfilled that ensure the 
limiting-normal likelihood function and the approximation by the simple 
measurement model: 

T, 

(2 ^C©*))- 1 2 If(^x :©*) = (6 - 9*) + r, 

where r designates a multivariate normal variable with mean 0 and precision 
j r( q *) = S 7.(0*), where 

<?(*,:«*) = J- «z,:8*) = J-ln/Xve*). 

00 00 

J.(0*) = £{P(x i; 0*)l^(z i; 0*):0*}, 
and where 0* is in a small neighborhood of the actual 0°. 
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Thus the error in the approximating model is normal, and its precision 
Jr ( ®*) 25 sum of the components 7i(6*), • • • » n( '.&*)• The approximating 
model is as if each component variable x i were a measurement variable with 
normal error and precision Jf 0 *). 

. For a classical model satisfying generalized Assumptions 1, 2, and 4 in 
Chapter Eight, the precision matrix at the value 0 is 

7(0) = : 0)r^(a; : 0) : 0} ; 

an alternative expression involving second derivatives is given in Section 2, 
Chapter Eight. 

When independent variables involving a quantity 0 are combined, the 
precision matrices are added to obtain the precision matrix for the composite 
variable. For a large number of variables the precision is the precision of 
the approximating normal measurement model. 

3 INFORMATION: THE SIMPLE MEASUREMENT MODEL 
Consider the simple measurement model 

TT /Oi) II de n 

x = 01 + e, 

and the corresponding reduced model 

/c(d(x))II/0 + f(x)) dr, 

r(x) = 6 + r. 

With various error distributions and with various observed orbits, a 
broad range of conditional error distributions is possible. The concept of 
precision is useful with normal errors or with a large number of error com- 
ponents ; for other cases the more general concept of information is needed to 
effectively assess measurements and the measurement process. 

The structural probability element for 0 given the measurement vector x is 

g*(6:x)dd = fc(d(x))II/(*< - 

this probability element gives the probability that the quantity is in a neigh- 
borhood dd of .0. The level of this probability can be described conveniently 
in logarithmic units: the information for the value 0 given the measurement 
vector x is 

7(x, 0) ~ !ng*(0:x) = In (k(d(x))TI/Oi - 0))- 

A large positive value for the information J(x, 0 ) is information in favor of 
the 0 value; a large negative value is information against; the value 0 corre- 
sponds to unit structural density. 
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Note that the difference in information between two 0-values is equal to 
the log-likelihood difference: 

/(x, 0") - 7(x, 0') = In (fc(d(x»n/(*i - 0 ")) - In (k(d(x» JJ /(** - 0 ')) 
= In (IT /(*, - 0 ")) - In (IT /(*, - 0')) 

= /(x:0") - /(x:0'). 

The information function is thus a representative log-likelihood function, 
but it has vertical placement — it has a zero point on the vertical scale. 

For the simple measurement model having normal error with variance 
the conditional error distribution is 


exp {~ ie AA 

The information for 0 given x is then 

I(x, 6) = i In ^ - i In (2») - 
(see Figure 1). 

The information 7(x, 0) describes information for 0 given the measurement 
vector x. Consider now the measurement process and its overall effectiveness. 
Let 0° designate the actual value of the quantity. Then, for the process 
of making n measurements on the quantity 0°, let 

7 (0° ; 0) = E{I (x, 0) : 0°} 




2 lD (^2) -Tn(2Tr) 



iln(^)-iln(27T) 

/ i 


/ 

A 


Figure 1. The information 7(x, 8) for 6, given x in the case of normal error. 
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be the information for the value 6 given the quantity 0°; it is the mean value of 
the information 7(x, 0) at 0 when the quantity has value 0°. 

Z(0°; 6) = Jin g*(S:x) Tl/fe - «") II dx, 

= Jin (fc(d(x)) II /(*< - 9)) II /(*. - 0") IT dx , 

= Jin (/c(d(x)) n /(«, - a)) n /w n *<• 

Note that the information at 0 given 0° depends only on the difference 

<5 = 0 - 0 °. 

The information for the actual value 0° is perhaps the most significant 
indicator of the effectiveness of the measurement process: 7(0°; 0°). 

For the simple measurement model with normal error the information at 0 
given 0° is 

z(0°; 0) = i In - i In (2tt) - E{(l - <5)*} 

= lln (i)_ lln( 2 „ ) _i ( 1 + a , |) 

(see Figure 2). The information for the actual value is 



Figure 2. The information at 0 = 0° + 6, given the actual value 0°. 
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it is an increasing function of the precision n/cr^. Thus with normal error, the 
information for the actual value and the precision are one-to-one monotone 
functions each of the other. 

Consider some properties of the information 7(0° ;0). 

The information has a maximum value at the actual 0°: 

7(0°; 0) - 7(0°; 0°) = £{7(x, 0) - 7(x, 0°):0°} 

= £{/(x:0) - / (x : 0°) : 0°} < 0; 

the inequality follows from Lemma 1 in Chapter Eight. 

The curvature of the information at the actual value is equal to the precision : 
Suppose Assumptions 1, 2, and 4 in Chapter Eight hold; then 


7(0°; 0) 


7(0°; 0) = -E 


~ln(fc(d(x))TT/(^ ~ 6) j : 00 J 


-l(x:0):0°. 


*4 INFORMATION: THE STRUCTURAL MODEL 
Consider the structural model 

f(E)dE, 

X = 9E, 

and the corresponding reduced model 

g([E]:D(X)) d[E ], 

[X] = Q[E\- 

the quantity 0 is an element of a group G (Assumption 3, Chapter 2), and the 
conditional error distribution is 

g([£] : D) dlE] = k(D)f([E] D) d{E)- 

The structural probability element for the quantity 0 is 

V"\ uo r\r r r a I v\ t /fl-1 v\ Jl([X]) do 


g*(0 : X) dd = k{D(X)')f(d~ 1 X)J N (d~ x X) 


J J L ([X])Jt(d) 
t(Bm)/(rtwr'j) dv(ey. 
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the right invariant differential dv{8) for 8 corresponds to the left invariant 
differential dp([E]) for [E]. The density function with respect to dv(d) can be 
described conveniently in logarithmic units : the inf ormation for the value 8 
given the observation X is 

l(X, 6) = lnh(D(X))/(0 , X)J lV (O- I X)^||||). 

A large positive value is information for the 6 value; a large negative value is 
information against the 0 value. 

The difference in information between two 8 values is equal to the likeli- 
hood difference for the two 6 values : the classical model for the response X is 

the log-likelihood function is 

l(X:d) = R(X) + In (j\8~ l X)J N (8^[X\j)-, 
the information difference from 8' to 6" is 

I(X, 8'j - I{X, O') = In (f(d"~ 1 X)J N (d"- 1 [X])) - In (f(d'- l X)J N (Q'~ x [X])) 
= l(X-.d") - l(X\8j, 

which is the likelihood difference from 8' to 8". 

The information I(X, 6) describes information for 6 given X. Let 8° denote 
the actual value of the quantity. Then the inf ormation for 8 given 6° is defined 
as 

1(8°; 8) = E{I(X,8):8 0 } 

= fin -f(e^x)j N (er^dx. 

The information for 8 has a maximum value at the actual 8°. 

1(8° ; 8) - 1(8° ; 8°) = E{l(X : 8 ) - l(X :8°) : 8°} 

^L r^ ^Uo; 
l fp-'xvrfp- 1 ) I 

the inequality follows from Lemma 1 in Chapter Eight. 

The curvature matrix of the information at the actual value is equal to the 
precision matrix. Suppose the generalized Assumptions 1,2, and 4 in Chapter 
Eight hold; then the jf element in the curvature matrix is 

_J*1_ 7(6° • 8)1 = -El— 1(X:8°):8°] 

L ddjdd/^ ’ [dd.ddy I 

= —E{l {ii) (X : 8°) :8 0 }, 

which is the jf element in the precision matrix. 
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PROBLEMS 


1. Show that the precision of the Cauchy error distribution 


1 de 
tt 1 + e 2 ’ 


-co < e < oo. 


is equal to the precision of the normal error distribution with mean 0 and variance 2. Use 
the integral j (1 + Z 2 ) -2 dt = n/2. (Fisher, 1922.) 

J— CO 

2. Consider the simple measurement model with uniform error : 

f(e) = 1, -h<e< i. 


(i) Show that 


— i<e< i, 
otherwise. 


I(x, 6) = -In (1 - R), |r(x) - 0| £ 


otherwise, 
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f 


where the location variable is 

max x t + min x t 
r(x) = 

and R = max x t — min x { . 

(ii) Show that 

J(fl°;fl) = - C0 , if 0 7*0°. 

(iii) For one measurement on 0 show that 

7(0°; 0°) = 0; 

and for two measurements on 0 show that 


7(0°; 0°) = h 

3. Consider the simple measurement model with location variable r(x) = x 1 and 
d(x) = (0, d 2 (x ), .... d n (x)) = (0, x 2 - x v ... ,x n - x x ). 

The error distribution in terms of (<? x , . . . , e n ) can be expressed in terms of e x , d 2 , . . . , 


U fi e d = g( e i • d)h( d 2> • • • . d n ). 

7(0°; 0°) -7®- /f; 


the information at the actual value is the Shannon information in the original error distribu- 
tion less the Shannon information in the orbital variable (whose value can be observed). 



Answers to Selected Problems 


CHAPTER ONE 

1. (i) Reduced model: e = 0.7906z; 61.5 = 0 + e. 

(ii) Pr {—1.55 <, e <, 1.55} = 95%; 

Pr {-2.04 < e <, 2.04} = 99%. 

(iii) The quantity 0 is normal with mean 61.5 and standard deviation 0.7906. 

Pr {59.95 < 0 ^ 63.05} =95%; Pr {59.46 ^ 0 < 63.54} = 99%. 

3. (i) Reduced model: g(ef)de x \ 157.01 = 0 + e x , where g(ef)de 1 — 50 de x on (—0.5, 
—0.48) and =0 otherwise. Structural distribution: uniform on the interval (157.49, 
157.51). 

(ii) Reduced model and structural distribution as in (i); no. 

(iii) The hypothesis 0 = 157.60 leads to e x = -0.59, a value outside the range of the 
error probability distribution; within the model the hypothesis is effectively denied. 

5. (i) Reduced model: gif) dr; r(x) = 0 + r, where g(r) dr = n exp {-nr} dr on (0, co) 
and = 0 otherwise. 

(ii) The structural distribution is n exp {— «r(x) + n&} dO on (— co, r(x)) and 0 else- 
where. 

9. The hypothesis leads to the value 2.4 for % 3 /V 3 or the value 17.28 for The 1 and 
4% points are 11.3 and 12.8; within the model there is strong evidence that the hypothesis 
is not true. 

12. The reference point has one d = 0, the remaining d’s greater than 0, and the sum of 
the remaining d's equal to n — 1. 


(i) Reduced model: g{b, s') db ds; [6(x), s(x)] = [u, a][b, s], where - 


g(b, s) = n exp {—nb} 


(n - l)"- 1 
T(n - 1) 


exp {—(n — l).v}.r n ~ 2 ' 


'0 < b < oo, 
[0 < s < co. 


(ii) The distribution of t — b(e)ls(e ) is 

^(0 = ^1 + ^T^) . o < t < oo. 

The structural distribution for g is 

/ b(x) - g 
s{x) 


• co < g < x {1) . 
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15. (i) Reduced model: g(b, s) db ds; s(x)] = [/«, a][b, j], where 

g{b , s) = k{&)p n Y\(b + sd^P- 1 exp { — ]£(6 + sd^js 71 - 2 

on 0 < 6 < oo, 0 < .? < co, and = 0 otherwise. The normalizing constant (partly 
evaluated) is 

--("•“jrassHT- 

(ii) The structural distribution for [g, tr] is 

„/*<D - p s(x)\ s(x) 


where g(b, s) is given in (i). 

22. (i) The orbits are rays from the origin (origin excluded). The reference points are the 
points of intersection of orbits with the plane £ x x — n. 

(ii) The error probability distribution is r _1 («)/z n exp {—rie}^” 1 de on (0, oo). 

(iii) The structural distribution for 0 is 

n n f 2*4 /2*A n dd 
roo exp "T" ) \~«tr j T ‘ 

CHAPTER TWO 

12. (i) Suppose geg x H,g 2 H. Then g - g x h x = g 2 h 2 with h x ,h 2 GH\ g x H = gh x x H - 
gH = ghlp-H = g 2 H. Otherwise, g x H and g 2 H are disjoint. It follows that [gH :g G G) is a 
partition of G. 

(ii) The sets gH, Hg each contain g. Left partition is equal to right partition *-+gH = Hg 
for all g ■< — > H is normal. 

13. Let H — H a , where z'G // a . For h in H\hH contains h; hHe{H a }; hence hH — H. 
For h in H:hH — //contains i; h~ x E H. Thus //contains i and is closed under formation 
of products and inverses: H is a subgroup. It follows that {Z/ a } consists of left cosets of H. 


CHAPTER THREE 


be a general element in G: 


(f 1 

~4y 


10. Let g 


.r 6jr rllr 

Thus gL — Lg for all g and L is normal. 
12. The analysis-of-variance table is 

Source Dimension C 


: b G i? r I ; 


Mean 

Linear 

Quadratic 

Residual 


Component Structure of Component 

814.088000 (« 1 v / 5 + o zj) 2 

i . 1 028 1 1 (a 2 V 0.148 + <rz 2 ) 2 

0.003681 (a 3 Vo. 003670 + <rz 3 ) 2 

0.005508 (a* 2 ) 2 

815.200000 
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(i) On hypothesis jS 3 (= a 3 ) = 0, F = 0.003681/(0.005508/2) = 1.337 is an observed 
■ value of F on 1 over 2, a reasonable value; data in accord with hypothesis. 

(ii) On assumption a 3 = 0, the quantity a 2 is now j3 2 2) based on two structural vectors; 

the fitted general level is 5.06216 + 2,72973a: = 12.76 + 2.72973(a; — 2.82); the structural 
distribution is 

8 ' = 2.72973 - ^Q- 005508 . Z . 2 „— = 2.730 + r*(0.136), 

Z 2 Vo. 148 

where t 2 is ordinary t on 2 (precautionary analysis using residual length from three struc- 
tural vectors). 

17. (i) The convenient transformation variable and reference point are 

where b(y) is the vector of regression coefficients of y on the row vectors in V and d(y) is 
the unadjusted residual vector. 

(ii) dm(Y) = II dy t ; dft(g) = dv{g) = II db u \ A (g) = 1. 

(iii) Error: /(d)II /(£ b u v ui + d t ) II db u . Structural: /c(d)II /(% — Yfj u v ui )H- dfi u , 

20. (i) The analysis of variance table is 

Source Dimension Component 


Mean 

A 

Residual 


32,961.63 

22.38 

16.99 

33,001.00 


(ii) The structural distributions are described by 

fi 3 - 2 = 2.22 + 0.743 lt* 0 , 

<u 2 - Mi = 1.90 + 0.9217^*, 

where t* 0 , t* Q * are ordinary t variables on 10 degrees of freedom. 

26. The marginal structural distributions are 

*2(|i: Y)diL = k(D(Y))s^~\Y) ■ ■ ■ sf+^^KY) 


n/ -e- 1 


Vii ~ Ml 


T n+J> . . . crn+1 
T U) °(3>) 


L \Jlpi Mj» J J 

g*C6:Y)dTS = k(D(Y))sJ$r-*(Y) - • ■ sfof^^CY) 


n/ th 


Vli - Pi 


dJ5 

yn+v . . . a n + 1 


Vvi ~ M: 
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27. The marginal structural distributions are 

= dn=E n dV ~ n u. 

gL( A n |r„(y)nr„(J0| v |r(y)| A 


g*(S: Y)d3 3 


exp {-^trS- 5(F)} 


<J>> ^ ' o-n+jr-l . . . ff n ’ 

CT (1) CT (d) 


where T^(Y) is the positive lower triangular square root of 
5(7) + «(m(y)-ti)(m(y)- t JL)'. 

29. (ii) Let Y = [y] D(y) be the positive lower triangular and orthogonal factorization 
of Y: 

' h*> (r) ° I fdi(y)1 

r 21 (y) s i2) (Y) 

[Y]= ‘ ,y>(r)= • ; 


UiOO ••• Vi (n W y )J 

t 3l (y), . . . , ? 3 3 _i(y) and ^ (3) ( y) are the regression coefficients and residual length of y 3 - 
on d x (y), . . . , d J _ 1 ( Y) ; d 5 (y) is the unit residual vector. 

(iii) dp-i = d i- 

31 - °> «p t-KH. + Z'JWMS 1 ■ • • *7’ n *oi n '*»■ ■ 

< U > «xp{-i.rS?|5|'-»-^g. 

oil) '“* | 2 t) 1v +1 «p {-* » ^ 5 < r » 


. 'iiOtMllT 
ff (D ’ ' ' J 


(y)--i)(y) ^ 


(iv) 

l OTr 2 ^(yj-'-^y) di: 

|2| n/2 afuC y) • • • jfrtC y) 2*(ff* 1) - • ■ er^,) 2 * 

36. (i) g L (H: D ) dH = ifc(Z»J f(T(HV + D)) dT ■ dH, 

rf(»: y) = ^^/(IT’-Hyxy - ®^)) dT ' *»• 
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dX , da dc dadc 1 

(iii) dm(x) Mg) =^T> Mg) = — , Mg) = ^ • 

(iv) g([E]:D:j 3) d[E] = k p (D) II f(e x + sd u , . . . , e p + rfe,- ds. 


g*{6:X:p)dd = . 

(v) The marginal likelihood is R + (D)lkp(D). 


5. (i) n 


tr nj) ^(AQ cr 


2 ir5 2 i (n7, - p)/2 - 1 ^ 2 


r((«/> -jp)/ 2) 


f 5 2 wiP 2 wT 


T[(np -p)/2] 


f 5 2 (Z)lG 2 (X)1 (nM>/2 ^ 2 

ex p(-^-jL^j ^ 

10. (i) The structural distribution for 6 given A is 

k(d x )Hf(x j — 6 ) dd. 

(ii) The marginal likelihood for X is 

1 „ I dx} I C f dx}~) _2 ~') V 


— II _J 
&(d A ) dx. 


1 f (dxMM'A 

■ 2 ^ 
i J J 


(iii) The marginal likelihood for A (normal error) is 

£t<$ exD /_^|nl?i|fs 


r*n-n* 

\dx i \ J 


12. (i) The structural distribution for (3 given X is 

k(D x )U.f(y x - y,f3 u v ui )Tl dfi u . 

(ii) The marginal likelihood for X is 

R + (D X ) |/(y:A)( 
k(D x ) [K/- 2 (y:A) V'\~A' 

13. (i) The structural distribution for (3 given A is 

I VV'\ l A ( 1 . 


|~ 2 exp j-~(p- b(y*))'FF'(f3 - b(y*))|rfp. 


{InolYI* \ 2 Cl ^ W v 

(ii) The marginal likelihood function is 

R + (D X ) ( 1 S(^) 2 ) |J(y:A)| 


2 o% j \ YM(y.l)V'\-M 


CHAPTER FIVE 
1. (i) The distribution of error position is 
f n *)M ( ne}\ 
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(ii) g s {C: D ) dC = k(D) f(BV + CD) dB - \C\ n - v ~ T dC, 

Jb 

r , , N |c(y)| n - r dT 

£g(r ; Y) dT = k(D)\ f(BV + r- 1 C{Y)D) dB • [r|n l r ^ . 

» , , jc(y)| n - r ^3itTG 

21. (i) k(D) IT - *V)) L |-^ psj^- • 

» , , . |C(y)| n - r ^/55rfS 

(ii) /c(D) n (X- ®*0) • 

22. <0 gS exp {-i | bjKK'b,} ft ^ exp HH, + *},.)} 

_ _ rfO 

„n— 2 — 1 . . . c n — r—j> TT . TT At ... • 

5 (1) ^(p) 11 “bi' j . . . A 

- rx j> ■ ri 2 

The marginal distribution of ( B , T) is given by the first two parts of the preceding dis- 
tribution. 

(ii) A «~r ' ~ ~ An - r -? .i l exp {-* tr S} |5|(n-r-i2-i)/2 d l . 

v ' (2 ir ) ( n-np/2 r 1 J 1 1 2 P 

i t/i/'Ip/2 ivi— r/2 

(iii) 1 J, — exp{ — itrS-H^y)- &)VV'(B{Y)- 3i)'}d3i 

(2t ry p > c ‘ 

A ■ ■ ■ A . |5 , (y)| (7, — r) / 2 d'G dO. 

A n—T ■ /i n—T—v + 1 W n / _ i tr y.-i yA L- - 

__ — \tn-T)'D r2 eX Pl 2 tr - 4 IV.I(n-T)/2 I 7?l . A . . . A ' 


(27r) ln_T,I>/2 
•/L . 


V |S|(«-r)/2 | TS| A • • • y< 2 

Kr |P/2| S( y)|-r/2 


/•a r r-p+1 1 1 ! _ _ — d3S\ 

} An-" A n- V + 1 l ; + W 10 - ®) ^'(*< y) - ^)r /2 


"n "n-jH-l 
/t ...a |5(y)| (n - r) / 2 t/TS 

A n-r A n-r- p+1 tr Y,-l Ct VYl ' A— . 

_.^ !n expp 4 tr )) | -s| A 

SJ , |^(n|'-rt _/e a 

w ( 27r )(n-7-)p/2 eX Pt * lr | 2 |(n-r+p+l )/2 2 J> 

23. The simplified matrix-variate (-distribution : 

A n _ r ---A m+l ww* 

A--- A n _ v+ , | / + HVV'HT 12 


CHAPTER SIX 

2. /(a?,0 o ) = -0 o ln (In *). Linearization with respect to 0* = -In 0 'gives /* = lnln* 
with distribution function 1 — exp {— exp {/* — 0*}}. 

CHAPTER SEVEN 

3. The squared length of the difference vector is 15.948037; % 2 = 63.792. The 1 and f % 
points for % 2 on 5 are 15.086 and 16.750; there is strong evidence against the hypothesis 
of symmetry. 
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290 

Composite response model, 192 
Composite structural equation, 9, 26, 52 
Conditional distribution, 72 
Conditional probability distribution, 14, 
32, 59 

Conditional structural model, 188, 203 
Conditioned model, 71 
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Continuous Poisson distribution, 258 
Coordinates, change of, 59 
Correlation coefficient, 193, 195 
Coset, 65, 67 
left, 81 
right, 81 


Decomposition of error distribution, 72 

Decreasing determinant, 148 

Determination, 18 

Diagonal transformation, 29 

Direct product, 1'92 

Double exponential error, 43, 44 

Error characteristic, 38, 67, 68 
Error distribution, 4, 5, 21, 50, 58 
factorization, 73 

Error probability distribution, 14, 32, 53, 
59 

Error variable, 3, 21, 49, 50 
Euclidean manifold, 54 
Event, 1 1 

Exponential error, 43, 44, 46 
Extended 1-vector, 6 

Factor group, 81 
Fisher information, 304 
matrix, 310 
Form, 190 

Frequency models, large samples, 270 

Generalized positive affine group, 47 
Genotype, 284 
Group, 6 
affine, 6, 22 

generalized positive affine, 47 
location, 6, 22, 75, 213, 220, 232, 321 
location-positive linear, 80 
location progression, 79, 141, 235 
location rotation, 218 
location scale, 43, 212 
positive affine, 21, 22, 43, 80, 192, 227 
positive linear, 80, 232, 246 
progression, 78, 142, 177 
regression-positive linear, 47, 249 
regression progression, 179, 250 
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regression scale, 88, 92, 98 
rotation, 197, 220, 221, 235, 244, 245, 
250 

scale, 22, 76 
scale and shear, 78 
shear, 77 

translation, 6, 22, 75, 213, 220 
translation rotation, 218 

Heine-Borel theorem, 302 
Homogeneity, 10, 26, 276 
Hypergeometric model, 268, 291, 293 
Hypothesis, 18, 37, 64 

Identity transformation, 5 
Increasing determinant, 148 
Indicator function, 36 
Inference, 19, 39, 63 
likelihood, large sample, 311 
Information, 325, 326, 328 
Inner product, 101 
error, 153 
matrix, 251 
quantity, 153 
Instrument, 3 
Internal error, 17, 49 
Invariant differential, 30, 55, 57 
factorization, 73 
left, 59 
right, 60 

Inverse transformations, 5 

Lattice point, 276 
Left coset, 65 

Left invariant differential, 59 
Likelihood difference, 296 
Likelihood function, 185, 295 
far from the quantity, 295 
large sample, 307 
limiting form, 311 . 
near the quantity, 302 
Likelihood iteration, 314, 315 
Likelihood modulation, 196 
Likelihood ratio, 187 
Linearized Poisson, 257 
distribution of, 260 
Linearized variable, 253 
Linear multivariate model, 247 
Linear subspace, 6 
Line of support, 297 


Linkage, 279 
Location, 23 

Location distribution, 39, 241 
Location group, 6, 22, 75, 213, 220,232, 
321 

Location model, 42 
Location-positive linear group, 80 
Location-progression group, 79, 141, 235 
Location-rotation group, 218 
Location-scale group, 43, 212 
Location-scale model, 42 
Location variable, 7 
Log-likelihood function, 187 

Marginal distribution, 72 
Marginal likelihood function, 190, 202, 
205, 256 

Marginal probability element, 189, 204 
Marginal structural distribution, 217 
Measurement, 4 

Measurement model, 21, 190, 243 
with additional quantity, 214 
• on the circle, 196 
on the sphere, 220 
Model, affine multivariate, 225, 248 
binomial, 267, 282 
composite measurement, 211 
composite multinomial, 268, 284, 290 
composite response, 192 
conditional structural, 188, 203 
hypergeometric, 268, 291, 293 
linear multivariate, 247 
location scale, 42 
measurement, 21, 190, 243 
measurement with additional quantity, 
214 

measurement on the circle, 196 
measurement on the sphere, 220 
multinomial, 268, 279 
multiplicative measurement , 45, 211 
multivariate, 225 
Poisson, 267, 293 
progression, 139 
regression, 85, 97 
regression linear, 248 
regression progression, 178 
regression with additional quantity, 215 
simple composite measurement, 212, 
217, 243, 322 

simple measurement, 4, 210, 244, 259 


simple measurement with additional 
quantity, 213 
simple progression, 178 
simple regression, 171, 275 
simple regression with additional quanti- 
. ty, 214 

stochastically monotone, 252 
structural, 50 

transformed measurement model, 214 
transformed regression, 205 
transformed simple measurement, 213 
transformed simple regression model, 
214 

Modular function, 62 
Multinomial model, 268, 279 
Multiplicative measurement model, 45, 
211 

Multivariate model, 225 

Natural response, 203 
New coordinates, 9 
Nikodyn derivative, 54 
Noncentral f-distribution, 211 
Normal error, 15, 32, 42, 45, 46, 132, 
150, 206, 210, 212, 213, 214, 215, 
224, 237, 243, 244, 245, 246, 247, 
248, 251, 319 
bivariate, 192, 193 

Normal error distribution, for the circle, 
200 

for the sphere, 224 
Normal subgroup, 81 

Observation, 18 
Orbit, 6, 22, 23, 25, 50 
Orthogonal basis, 106 
Orthogonal component, 238 
Orthogonality conditions, 101 
Orthogonality equations, 101 
Orthonormal set, 124 
Outside information, 68 

Pareto distribution, 263 
Partition, 22, 48, 50, 65 
Period of oscillation, 170 
Phenotype, 284 
Poisson basis, 268 
Poisson distribution, 257, 258, 264 
Poisson model, 267, 293 
Position, 7, 23, 51, 276 


Positive affine group, 21, 22, 43, 80, 192, 
227 

Positive linear group, 80, 232, 246 
Positive orthogonal group, 197, 220, 221, 
235, 244, 245, 250 
Precision, 320, 327 
Precision matrix, 323 
Primary quantity, 188 
Probability for an unknown constant, 10 
Probability function, 185 
Process, 17 

Product of transformations, 5 
Progression group, 78, 142, 177 
Progression model, 139 
Projection, 99 
Pythagorean theorem, 103 

Quantity, 4, 18, 49, 50 

Realized error, 50 
Realized value, 4 

Reduced model, 14, 32, 53, 59, 186, 189 
Reduction, 11, 27 
Reference direction, 197, 220 
Reference point, 8, 25, 51, 276 
change of, 60 

Regression coefficients, 102 
Regression-linear model, 248 
Regression model, 85, 97 
with additional quantity, 215 
Regression-positive linear group, 47, 249 
Regression-progression group, 179, 250 
Regression-progression model, 178 
Regression-scale group, 88, 92, 98 
Residual length, 103 
Residual vector, 101, 103 
Response level, 21 
Right coset, 67 

Rotation group, 197, 220, 221, 235, 244, 
245, 250 

Rotational symmetry, 219, 234, 247, 

250 

Scale and shear group, 78 
Scale distribution, 39, 242 
Scale group, 22, 76 
Scale variable, 45 
Scaling, 23 

Semidirect product, 69 
Shannon error, 330 
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Shape, 190 
Shear group, 77 

Simple composite measurement model, 
212, 217, 243, 322 

Simple measurement model, 4, 210, 244, 
259 

with additional quantity, 213 
limiting form, 317 
Simple progression model, 178 
Simple regression model, 171, 275 
with additional quantity, 214 
Stabilizer subgroup, 216 
Stirling’s formula, 259 
Stochastically monotone model, 252 
Strictly convex, 297 
Structural distribution, 20, 41, 63 
combining, 84 
factorization, 74 

Structural equation, 4, 5, 14, 21, 32, 50, 
53 

composite, 9 
Structural model, 50 
Structural probability, 20, 41, 64, 256 
Structural vectors, 96, 274 
orthogonal, 106 
Subgroup, 22 
Subspace, affine, 7 
linear, 6 
System, 17 


f-distribution, 40 
matrix valued, 251 
simplified, 134 
Three-factor design, 176 
Transformation groups, see Group 
Transformations, 5 
Transformation variable, 25, 51 
Transformed measurement model, 214 
Transformed regression model, 205 
Transformed simple measurement model, 
213 

Transformed simple regression model, 214 
Translation, 6, 8 

Translation group, 6, 22, 75, 213, 220 
Translation-rotation group, 218 
Triangular component, 238 
Triangular orthogonal factorization, 124 
Triangular square-root factorization, 125 
r-variabie, matrix valued, 251 
simplified, 134 
Two-factor design, 174 

Uniform error, 36, 43, 44, 46, 329 
Uniform shift, 265 
Unitary group, 49 

Volume cross-sectional to orbit, 191 

Weibull error, 44, 45, 211, 263 
Wishart distribution, 156, 158, 181 


Test of significance, 18, 37, 64 



